thr3ads.net - similar to: "[LLVMdev] Enabling the Loop Vectorizer for a few hours

[LLVMdev] Enabling the vectorizer for -Os -- ping

2013 Jun 14

5

[LLVMdev] Enabling the vectorizer for -Os -- ping

Hi, Last week I wrote llvm-dev and presented data that shows how enabling the vectorizer on -Os can improve the performance of many workloads and that it has negligible effects on code size. I also added a command line switch to make it easier for people to benchmark the vectorizer using -Os directly from clang without changing LLVM. Has anyone done any benchmarks on -Os + vectorization ?

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

3

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it

[LLVMdev] Enabling the vectorizer for -Os -- ping

2013 Jun 14

1

[LLVMdev] Enabling the vectorizer for -Os -- ping

Hi Nadav, No noticeable difference between "-Os" and "-Os -fvectorize" in code size or compilation times in my tests, and only minimal performance improvements (small enough to be ignored). cheers, --renato On 14 June 2013 09:29, Renato Golin <renato.golin at linaro.org> wrote: > On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > >>

[LLVMdev] Enabling the vectorizer for -Os -- ping

2013 Jun 14

0

[LLVMdev] Enabling the vectorizer for -Os -- ping

On 14 June 2013 05:37, Nadav Rotem <nrotem at apple.com> wrote: > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I also added a command line switch to > make it easier for people to benchmark the vectorizer using -Os directly > from

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

0

[LLVMdev] Limit loop vectorizer to SSE

Nadav, I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to vectorization. I can't tell if it's the loop vectorizer or the codegen at fault, but the alignment assumption seems to sneak in somewhere. v/r, Josh [1]

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 13

2

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 23

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, Sorry for the delay in response. I measured the code size change and noticed small changes in both directions for individual programs. I found a 30k binary size growth for the entire testsuite + SPEC. I attached an updated performance report that includes both compile time and performance measurements. Thanks, Nadav On Jul 14, 2013, at 10:55 PM, Nadav Rotem <nrotem at apple.com>

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

4

[LLVMdev] Limit loop vectorizer to SSE

Something like: index 6db7f68..68564cb 100644 --- a/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -1208,6 +1208,8 @@ void InnerLoopVectorizer::vectorizeMemoryInstruction(Instr Type *DataTy = VectorType::get(ScalarDataTy, VF); Value *Ptr = LI ? LI->getPointerOperand() : SI->getPointerOperand(); unsigned Alignment = LI ?

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

1

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Nadav, Okay. 1. The comment doesn't make this clear. I would suggest, at a minimum, updating it to mention pairs specifically, to avoid the issue in #2 2. If the day comes when the selectiondag store vectorizer handles more than pairs, and does so better, is anyone really going to remember this random 3 exists in the other vectorizer? I would posit, based on experience, the answer is

[LLVMdev] Enabling the vectorizer for -Os -- ping

2013 Jun 14

0

[LLVMdev] Enabling the vectorizer for -Os -- ping

Sorry for the delays here. I am running our benchmark suite and will have data in a day or so. On Jun 13, 2013 9:40 PM, "Nadav Rotem" <nrotem at apple.com> wrote: > Hi, > > Last week I wrote llvm-dev and presented data that shows how enabling the > vectorizer on -Os can improve the performance of many workloads and that it > has negligible effects on code size. I

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

2

[LLVMdev] Limit loop vectorizer to SSE

A fix for this is in r194876. Thanks for reporting this! On Nov 15, 2013, at 3:49 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > I believe aligned accesses to unaligned pointers is precisely the issue. Consider the function `add_u8S` before[1] and after[2] the loop vectorizer pass. There is no alignment assumption associated with %kernel_data prior to

[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?

2013 Jan 03

0

[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?

Hi Akira! > > Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop? Yes, it uses a cost model to determine the profitability of vectorization. At the moment only x86 provides the necessary hooks that are needed for calculating the costs. We may need to change the cost defaults to

[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?

2013 Jan 03

3

[LLVMdev] Does loop vectorizer inquire about target's SIMD capabilities?

Nadav (or anyone who is familiar with the loop vectorizer), Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop? I am asking this because I would like to have loop vectorization disabled for targets that don't support SIMD instructions (for example, standard mips32). Loop vectorization

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 14

6

[LLVMdev] Enabling the SLP vectorizer by default for -O3

Hi, LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP vectorizer on a Sandybridge mac (using SSE4, w/o AVX). Based on my performance measurements

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

2

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hey Nadav, I'd humbly suggest that rather than use 3 directly, you should add a shared constant between these two passes, so when one changes, the other doesn't need to be updated. It would also ensure this bit of info about what needs to be updated isn't only contained in the comments.. On Fri, Jul 26, 2013 at 4:07 PM, Nadav Rotem <nrotem at apple.com> wrote: > Author:

[LLVMdev] Enabling the SLP vectorizer by default for -O3

2013 Jul 15

0

[LLVMdev] Enabling the SLP vectorizer by default for -O3

On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi, > > LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it can use the clang command line flag “-fslp-vectorize”. I ran LLVM’s test suite with and without the SLP

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

2013 Jul 27

0

[LLVMdev] [llvm] r187267 - SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.

Hi Daniel, Maybe my commit message was not clear. The idea is that the SelectionDAG store vectorizer can only handle pairs. So, the number three means "more than a pair". Thanks, Nadav Sent from my iPhone > On Jul 26, 2013, at 17:48, Daniel Berlin <dberlin at dberlin.org> wrote: > > Hey Nadav, > I'd humbly suggest that rather than use 3 directly, you should

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

0

[LLVMdev] Limit loop vectorizer to SSE

----- Original Message ----- > From: "Arnold Schwaighofer" <aschwaighofer at apple.com> > To: "Joshua Klontz" <josh.klontz at gmail.com> > Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu> > Sent: Friday, November 15, 2013 4:05:53 PM > Subject: Re: [LLVMdev] Limit loop vectorizer to SSE > > > Something like: > > index

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

6

[LLVMdev] Limit loop vectorizer to SSE

On Nov 15, 2013, at 12:36 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 15 November 2013 20:24, Joshua Klontz <josh.klontz at gmail.com> wrote: > Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks! > > I think

[LLVMdev] Limit loop vectorizer to SSE

2013 Nov 15

2

[LLVMdev] Limit loop vectorizer to SSE

Yes, I was just about to send out: DL->getABITypeAlignment(ScalarDataTy); The question is: “… ABI alignment for the target …" is that getPrefTypeAlignment or getABITypeAlignment I would have thought the latter. On Nov 15, 2013, at 4:12 PM, Hal Finkel <hfinkel at anl.gov> wrote: > ----- Original Message ----- >> From: "Arnold Schwaighofer"

similar to: [LLVMdev] Enabling the Loop Vectorizer for a few hours - heads up