similar to: [RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Displaying 20 results from an estimated 10000 matches similar to: "[RFC] Allow loop vectorizer to choose vector widths that generate illegal types"

2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Some thoughts: o To determine the VF for a loop with mixed data sizes, choosing the smallest ensures each vector register used is full, choosing the largest will minimize the number of vector registers used. Which one’s better, or some size in between, depends on the target’s costs for the vector operations, availability of registers and possibly control/memory divergence and trip count. “This is
2016 Jun 15
8
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Hello, Currently the loop vectorizer will, by default, not consider vectorization factors that would make it generate types that do not fit into the target platform's vector registers. That is, if the widest scalar type in the scalar loop is i64, and the platform's largest vector register is 256-bit wide, we will not consider a VF above 4. We have a command line option (-mllvm
2014 Dec 11
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
Hi Nadav/Devs I am exploring Loop Vectorizer to vectorize i8 scalar operations into 8xi8 vector operation. I was expecting the Loop Vectorizer to analyze the profitability for vectorization factor(VF) of 8, However it is not doing so due to the widest type calculation done for the blocks inside the loop. May be I am missing something, however, I am curious to know why Loop Vectorizer limits the
2014 Dec 13
2
[LLVMdev] Vectorization factor limitation in Loop Vectorizer
So IMO, if we modify the VF calculation for targets/subtargets using TTI where higher VF is supported The vectorizer’s scope will become wider. Did/do you foresee any issue with this? Thanks, Shahid From: Nadav Rotem [mailto:nrotem at apple.com] Sent: Saturday, December 13, 2014 2:47 AM To: Shahid, Asghar-ahmad Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vectorization factor limitation in
2014 Jan 16
3
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Wed, Jan 15, 2014 at 5:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Was the vectorizer successful in unrolling the loop in quantum_sigma_x? I > wonder if 'size’ is typically high or low. No. The vectorizer stated that it wasn't going to bother with the loop because it wasn't profitable. Specifically: LV: Checking a loop in "quantum_sigma_x" LV: Found a
2016 Feb 19
2
target triple in 3.8
I have some trouble making the SIMD vector length visible to the passes. My application is basically on the level of 'opt'. What I did in version 3.6 was functionPassManager->add(new llvm::TargetLibraryInfo(llvm::Triple(Mod->getTargetTriple()))); functionPassManager->add(new llvm::DataLayoutPass()); and then the -basicaa and -loop-vectorizer were able to vectorize the input
2014 Jan 16
11
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
I am starting to use the sample profiler to analyze new performance opportunities. The loop unroller has popped up in several of the benchmarks I'm running. In particular, libquantum. There is a ~12% opportunity when the runtime unroller is triggered. This helps functions like quantum_sigma_x (http://sourcecodebrowser.com/libquantum/0.2.4/gates_8c_source.html#l00149). The function accounts
2015 Jul 28
2
[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit
Hi Michael, +llvmdev,Hal,Nadav For testing, I was currently thinking of a two pronged approach. Lit tests as you suggest with a dummy pass, probably with command line options to define what transform to do, and unit tests to test the delegate behaviour and return values. I'll try and produce a mega patch with at least the loop vectoriser moved over, then split it up again after review.
2012 Feb 08
2
[LLVMdev] SelectionDAG scalarizes vector operations.
Duncan, I had a few thoughts regarding our short discussion yesterday. I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. At first, we let the Type Legalizer/VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is
2013 May 10
2
[LLVMdev] Simple Loop Vectorize Question
Nadav, Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S -debug double.ll' doesn't appear to make a difference. In fact it seems to be ignored as garbage values for -mcpu don't raise an error. Am I overlooking something else also? Many Thanks, Josh On Thu, May 9, 2013 at 6:06 PM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Josh, > > Your
2013 Oct 21
2
[LLVMdev] First attempt at recognizing pointer reduction
Hi Arnold, To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a
2016 Aug 09
2
enabling interleaved access loop vectorization
Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. The vectorized loop is actually significantly faster when benchmarked in isolation, but in
2013 May 09
2
[LLVMdev] Simple Loop Vectorize Question
Hi! I am trying to get the loop vectorizer to work on a simple example (http://pastebin.com/tGhpc4y0) that doubles every element in a vector. I've found that 'opt -loop-vectorize -force-vector-width=4 -S -debug double.ll' works as expected. However, removing the -force-vector-width flag results in no vectorization. From the debug output I can see that the issue boils down to: LV: The
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi Nadav, > I had a few thoughts regarding our short discussion yesterday. > > I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. what I would like to understand first is why there are any vector xEXT nodes at all! As I tried to explain on IRC, I don't think you ever get these
2016 Aug 16
2
enabling interleaved access loop vectorization
Hi Ayal, Elena, I'd really like to enable this by default. As I wrote above, I didn't see any regressions in internal benchmarks, and there doesn't seem to be anything in SPEC2006 either. I do see a performance improvement in an internal benchmark (that is, a real workload). Would you be able to provide an example that gets pessimized? I have no doubt you've seen regressions
2016 Aug 07
2
enabling interleaved access loop vectorization
We checked the gathered data again. All regressions that we see are in 32-bit mode. The 64-bit mode looks good overall. - Elena From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Saturday, August 06, 2016 02:56 To: Renato Golin <renato.golin at linaro.org> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>;
2013 Oct 26
3
[LLVMdev] Why is the loop vectorizer not working on my function?
----- Original Message ----- > >>> LV: The Widest type: 32 bits. > >>> LV: The Widest register is: 32 bits. > > Yep, we don’t pick up the right TTI. > > Try -march=x86-64 (or leave it out) you already have this info in the > triple. > > Then it should work (does for me with your example below). That may depend on what CPU is picks by default; Frank,
2013 May 10
0
[LLVMdev] Simple Loop Vectorize Question
Hi Josh, This line works for me: opt file.ll -loop-vectorize -S -o - -mtriple=x86_64 -mcpu=corei7-avx -debug You need to specify the triple on the command line if it is not inside the module. Thanks, Nadav On May 9, 2013, at 5:53 PM, Joshua Klontz <josh.klontz at gmail.com> wrote: > Nadav, > > Please forgive my ignorance, but 'opt -mcpu=corei7 -loop-vectorize -S
2016 Sep 21
5
RFC: Extending LV to vectorize outerloops
Proposal for extending the Loop Vectorizer to handle Outer Loops ================================================================ Goal: ----- We propose to extend the innermost Loop Vectorizer to also handle outerloops (cf.[1]). Our aim is to best leverage the efforts already invested in the existing innermost Loop Vectorizer rather than introduce a separate pass dedicated to outerloop
2013 May 09
0
[LLVMdev] Simple Loop Vectorize Question
Hi Josh, Your modules does not have a triple, so the target machine and TargetTransformInfo have no way of knowing if you are running on a machine with vector registers. Try adding the '-mcpu=XXXX' to opt and see what happens. Thanks, Nadav On May 9, 2013, at 1:42 PM, Josh Klontz <josh.klontz at gmail.com> wrote: > Hi! I am trying to get the loop vectorizer to work on a