thr3ads.net - similar to: "[RFC] Allow loop vectorizer to choose vector widths that generate illegal types"

Displaying 20 results from an estimated 10000 matches similar to: "[RFC] Allow loop vectorizer to choose vector widths that generate illegal types"

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 16

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Some thoughts: o To determine the VF for a loop with mixed data sizes, choosing the smallest ensures each vector register used is full, choosing the largest will minimize the number of vector registers used. Which one’s better, or some size in between, depends on the target’s costs for the vector operations, availability of registers and possibly control/memory divergence and trip count. “This is

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 15

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Hello, Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4. We have a command line option (-mllvm

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 11

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

2014 Dec 13

[LLVMdev] Vectorization factor limitation in Loop Vectorizer

So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 16

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Wed, Jan 15, 2014 at 5:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Was the vectorizer successful in unrolling the loop in quantum_sigma_x? I > wonder if 'size’ is typically high or low. No. The vectorizer stated that it wasn't going to bother with the loop because it wasn't profitable. Specifically: LV: Checking a loop in "quantum_sigma_x" LV: Found a

target triple in 3.8

2016 Feb 19

target triple in 3.8

I have some trouble making the SIMD vector length visible to the passes. My application is basically on the level of 'opt'. What I did in version 3.6 was functionPassManager->add(new llvm::TargetLibraryInfo(llvm::Triple(Mod->getTargetTriple()))); functionPassManager->add(new llvm::DataLayoutPass()); and then the -basicaa and -loop-vectorizer were able to vectorize the input

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 16

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

I am starting to use the sample profiler to analyze new performance opportunities. The loop unroller has popped up in several of the benchmarks I'm running. In particular, libquantum. There is a ~12% opportunity when the runtime unroller is triggered. This helps functions like quantum_sigma_x (http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149). The function accounts

[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit

2015 Jul 28

[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit

Hi Michael, +llvmdev,Hal,Nadav For testing, I was currently thinking of a two pronged approach. Lit tests as you suggest with a dummy pass, probably with command line options to define what transform to do, and unit tests to test the delegate behaviour and return values. I'll try and produce a mega patch with at least the loop vectoriser moved over, then split it up again after review.

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

[LLVMdev] SelectionDAG scalarizes vector operations.

Duncan, I had a few thoughts regarding our short discussion yesterday. I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. At first, we let the Type Legalizer/VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is

[LLVMdev] Simple Loop Vectorize Question

2013 May 10

[LLVMdev] Simple Loop Vectorize Question

Nadav, Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S -debug double.ll' doesn't appear to make a difference. In fact it seems to be ignored as garbage values for -mcpu don't raise an error. Am I overlooking something else also? Many Thanks, Josh On Thu, May 9, 2013 at 6:06 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Josh, > > Your

[LLVMdev] First attempt at recognizing pointer reduction

2013 Oct 21

[LLVMdev] First attempt at recognizing pointer reduction

Hi Arnold, To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a

enabling interleaved access loop vectorization

2016 Aug 09

enabling interleaved access loop vectorization

Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. The vectorized loop is actually significantly faster when benchmarked in isolation, but in

[LLVMdev] Simple Loop Vectorize Question

2013 May 09

[LLVMdev] Simple Loop Vectorize Question

Hi! I am trying to get the loop vectorizer to work on a simple example (http://pastebin.com/tGhpc4y0) that doubles every element in a vector. I've found that 'opt -loop-vectorize -force-vector-width=4 -S -debug double.ll' works as expected. However, removing the -force-vector-width flag results in no vectorization. From the debug output I can see that the issue boils down to: LV: The

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi Nadav, > I had a few thoughts regarding our short discussion yesterday. > > I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. what I would like to understand first is why there are any vector xEXT nodes at all! As I tried to explain on IRC, I don't think you ever get these

enabling interleaved access loop vectorization

2016 Aug 16

enabling interleaved access loop vectorization

Hi Ayal, Elena, I'd really like to enable this by default. As I wrote above, I didn't see any regressions in internal benchmarks, and there doesn't seem to be anything in SPEC2006 either. I do see a performance improvement in an internal benchmark (that is, a real workload). Would you be able to provide an example that gets pessimized? I have no doubt you've seen regressions

enabling interleaved access loop vectorization

2016 Aug 07

enabling interleaved access loop vectorization

We checked the gathered data again. All regressions that we see are in 32-bit mode. The 64-bit mode looks good overall. - Elena From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Saturday, August 06, 2016 02:56 To: Renato Golin <renato.golin at linaro.org> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>;

[LLVMdev] Why is the loop vectorizer not working on my function?

2013 Oct 26

[LLVMdev] Why is the loop vectorizer not working on my function?

----- Original Message ----- > >>> LV: The Widest type: 32 bits. > >>> LV: The Widest register is: 32 bits. > > Yep, we don’t pick up the right TTI. > > Try -march=x86-64 (or leave it out) you already have this info in the > triple. > > Then it should work (does for me with your example below). That may depend on what CPU is picks by default; Frank,

[LLVMdev] Simple Loop Vectorize Question

2013 May 10

[LLVMdev] Simple Loop Vectorize Question

Hi Josh, This line works for me: opt file.ll -loop-vectorize -S -o - -mtriple=x86_64 -mcpu=corei7-avx -debug You need to specify the triple on the command line if it is not inside the module. Thanks, Nadav On May 9, 2013, at 5:53 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S

RFC: Extending LV to vectorize outerloops

2016 Sep 21

RFC: Extending LV to vectorize outerloops

Proposal for extending the Loop Vectorizer to handle Outer Loops ================================================================ Goal: ----- We propose to extend the innermost Loop Vectorizer to also handle outerloops (cf.[1]). Our aim is to best leverage the efforts already invested in the existing innermost Loop Vectorizer rather than introduce a separate pass dedicated to outerloop

[LLVMdev] Simple Loop Vectorize Question

2013 May 09

[LLVMdev] Simple Loop Vectorize Question

Hi Josh, Your modules does not have a triple, so the target machine and TargetTransformInfo have no way of knowing if you are running on a machine with vector registers. Try adding the '-mcpu=XXXX' to opt and see what happens. Thanks, Nadav On May 9, 2013, at 1:42 PM, Josh Klontz <josh.klontz at gmail.com> wrote: > Hi! I am trying to get the loop vectorizer to work on a

similar to: [RFC] Allow loop vectorizer to choose vector widths that generate illegal types