similar to: [LLVMdev] Enabling the vectorizer for -Os -- ping

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Enabling the vectorizer for -Os -- ping"

2013 Jun 14
0
[LLVMdev] Enabling the vectorizer for -Os -- ping
Sorry for the delays here. I am running our benchmark suite and will have data in a day or so. On Jun 13, 2013 9:40 PM, "Nadav Rotem" <nrotem at apple.com> wrote: > Hi, > > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I
2013 Jun 14
1
[LLVMdev] Enabling the vectorizer for -Os -- ping
Hi Nadav, No noticeable difference between "-Os" and "-Os -fvectorize" in code size or compilation times in my tests, and only minimal performance improvements (small enough to be ignored). cheers, --renato On 14 June 2013 09:29, Renato Golin <renato.golin at linaro.org> wrote: > On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > >>
2013 Jun 14
0
[LLVMdev] Enabling the vectorizer for -Os -- ping
On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I also added a command line switch to > make it easier for people to benchmark the vectorizer using -Os directly > from
2013 Jun 16
3
[LLVMdev] Enabling the vectorizer for -Os -- ping
All I have to say is *wow*. The vectorizer performs *remarkably* better now than it did the last time I benchmarked it. I'm stunned. I measured -O2 and -Os, as well as -march=x86-64 and -march=corei7-avx. My hope with the latter two was to cover both worst-case and best-case in terms of the quality of the vector ISA available. First, binary size growth. This is measured on average across a
2013 May 10
2
[LLVMdev] Simple Loop Vectorize Question
Nadav, Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S -debug double.ll' doesn't appear to make a difference. In fact it seems to be ignored as garbage values for -mcpu don't raise an error. Am I overlooking something else also? Many Thanks, Josh On Thu, May 9, 2013 at 6:06 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Josh, > > Your
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2013 May 10
0
[LLVMdev] Simple Loop Vectorize Question
Hi Josh, This line works for me: opt file.ll -loop-vectorize -S -o - -mtriple=x86_64 -mcpu=corei7-avx -debug You need to specify the triple on the command line if it is not inside the module. Thanks, Nadav On May 9, 2013, at 5:53 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S
2013 Jul 14
6
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements
2013 Jul 23
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>
2013 Jun 05
15
[LLVMdev] Enabling the vectorizer for -Os
Hi, I would like to start a discussion about enabling the loop vectorizer by default for -Os. The loop vectorizer can accelerate many workloads and enabling it for -Os and -O2 has obvious performance benefits. At the same time the loop vectorizer can increase the code size because of two reasons. First, to vectorize some loops we have to keep the original loop around in order to handle the last
2013 Jul 15
0
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP
2013 Jul 27
2
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:
2013 Nov 15
6
[LLVMdev] Limit loop vectorizer to SSE
On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: > Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! > > I think
2013 Nov 15
0
[LLVMdev] Limit loop vectorizer to SSE
Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]
2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in
2013 Jul 27
1
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is
2013 Aug 20
3
[LLVMdev] Failure to optimize vector select
On Aug 20, 2013, at 10:22 , Nadav Rotem <nrotem at apple.com> wrote: > Can you send the IR of the function ? Attached is the -O0 and -O3 IR -------------- next part -------------- A non-text attachment was scrubbed... Name: vselect_optimized.ll Type: application/octet-stream Size: 1545 bytes Desc: not available URL:
2013 Nov 15
4
[LLVMdev] Limit loop vectorizer to SSE
Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?
2013 Jul 27
0
[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should
2013 Jun 05
0
[LLVMdev] Enabling the vectorizer for -Os
On 5 June 2013 04:26, Nadav Rotem <nrotem at apple.com> wrote: > I would like to start a discussion about enabling the loop vectorizer by > default for -Os. The loop vectorizer can accelerate many workloads and > enabling it for -Os and -O2 has obvious performance benefits. Hi Nadav, As it stands, O2 is very similar to O3 with a few, more aggressive, optimizations running,