thr3ads.net - similar to: "RFC: Extending LV to vectorize outerloops"

Displaying 20 results from an estimated 20000 matches similar to: "RFC: Extending LV to vectorize outerloops"

[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

2017 Dec 06

[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Status Update on VPlan ---- where we are currently, and what's ahead of us ========================================================== Goal: ----- Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan. The goal of this status update is to summarize the progress and the future steps needed. Background: ----------- This is related to

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2017 Dec 06

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

Proposal for Outer Loop Vectorization Implementation Plan ============================================= ===== Goal: ===== Extending Loop Vectorizer (LV) such that it can handle outer loops, via VPlan infrastructure enhancements. Understand the trade-offs in trying to make concurrent progress with moving remaining inner loop vectorization functionality to VPlan infrastructure ===========

[LLVMdev] LLVM loop vectorizer

2015 Jul 08

[LLVMdev] LLVM loop vectorizer

Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV function is inspired from

enabling interleaved access loop vectorization

2016 Aug 09

enabling interleaved access loop vectorization

Thanks Ayal! I'll take a look at DENBench. As another data point - I tried enabling this on our internal benchmarks. I'm seeing one regression, and it seems to be a regression of the "good" kind - without interleaving we don't vectorize the innermost loop, and with interleaving we do. The vectorized loop is actually significantly faster when benchmarked in isolation, but in

[LLVMdev] LLVM loop vectorizer

2016 Jun 04

[LLVMdev] LLVM loop vectorizer

Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael > On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: >

[LLVMdev] LLVM loop vectorizer

2016 Feb 18

[LLVMdev] LLVM loop vectorizer

Hi Alex, I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes. Thanks, Michael > On Feb 15, 2016, at 6:44 AM, RCU

enabling interleaved access loop vectorization

2016 Aug 16

enabling interleaved access loop vectorization

Hi Ayal, Elena, I'd really like to enable this by default. As I wrote above, I didn't see any regressions in internal benchmarks, and there doesn't seem to be anything in SPEC2006 either. I do see a performance improvement in an internal benchmark (that is, a real workload). Would you be able to provide an example that gets pessimized? I have no doubt you've seen regressions

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

2016 Jun 16

[RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Some thoughts: o To determine the VF for a loop with mixed data sizes, choosing the smallest ensures each vector register used is full, choosing the largest will minimize the number of vector registers used. Which one’s better, or some size in between, depends on the target’s costs for the vector operations, availability of registers and possibly control/memory divergence and trip count. “This is

enabling interleaved access loop vectorization

2016 Aug 17

enabling interleaved access loop vectorization

Thanks Ayal! On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote: > Hi Michael, > > > > Don’t quite have a full reproducer for you yet. You’re welcome to try and > see what’s happening in 32 bit mode when enabling interleaving for the > following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”: > > > > void rgb2yik

[LLVMdev] LLVM loop vectorizer

2016 Jun 07

[LLVMdev] LLVM loop vectorizer

Hi Alex, This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771 Adam > On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line

enabling interleaved access loop vectorization

2016 Sep 01

enabling interleaved access loop vectorization

So turns out it is a full reproducer after all (choosing to vectorize on AVX), good. > The details are in PR29025. Interesting. (So we should carefully insert unconditional branches inside shuffle sequences, eh? ;-) > But if we modify the program by adding "*out++ = 0" right after "*out++ = q;" (thus eliminating the pesky <12 x i8>), we get: Indeed such

enabling interleaved access loop vectorization

2016 Aug 07

enabling interleaved access loop vectorization

We checked the gathered data again. All regressions that we see are in 32-bit mode. The 64-bit mode looks good overall. - Elena From: Michael Kuperstein [mailto:mkuper at google.com] Sent: Saturday, August 06, 2016 02:56 To: Renato Golin <renato.golin at linaro.org> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>;

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2017 Dec 14

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

>Another might be to introduce changes under feature flags to ease the revert/reintroduce/revert cycle. This is essentially the first guard. We plan to have flags/settings to control which types of outer loops are handled. The new code path is initially exclusive to outer loop vectorization. If we disable all types of outer loops (and that's the initial default), LV continues to be good

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2018 Jan 15

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

To revive the discussion around vectorizer testing, here's a quick sample of a few of the issues hit recently in the loop vectorizer. I want to be careful to say that I am not stating these are the result of any recent work, just that they're issues that have been triaged down to the loop vectorizer doing something incorrect or questionable from a performance perspective.

[LLVMdev] SelectionDAG scalarizes vector operations.

2012 Feb 08

[LLVMdev] SelectionDAG scalarizes vector operations.

Hi Dave, >> We generate xEXT nodes in many cases. Unlike GCC which vectorizes >> inner loops, we vectorize the implicit outermost loop of data-parallel >> workloads (also called whole function vectorization). We vectorize >> code even if the user uses xEXT instructions, uses mixed types, etc. >> We choose a vectorization factor which is likely to generate more

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Dec 12

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

Hi Xinmin, I have updated the clang patch using the standard name mangling you suggested - I was not fully aware of the C++ mangling convention “_ZVG”. I am using “D” for 64-bit NEON and “Q” for 128-bit NEON, which makes NEON vector symbols look as follows: _ZVGQN2v__Z1fd _ZVGDN2v__Z1ff _ZVGQN4v__Z1ff Here “Q” means -> NEON 128-bit, “D” means -> NEON 64-bit Please notice that although

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

Summarizing the discussion on the implementation approaches. Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running vectorization pass and limiting vectorization factor of epilog loop by metadata. <Approach-2> Challenges with

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Nov 30

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

Dear all, I have just created a couple of differential reviews to enable the vectorisation of loops that have function calls to routines marked with “#pragma omp declare simd”. They can be (re)viewed here: * https://reviews.llvm.org/D27249 * https://reviews.llvm.org/D27250 The current implementation allows the loop vectorizer to generate vector code for source file as: #pragma omp declare

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

2016 Dec 08

[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer

Hi Francesco, a bit more information. GCC veclib is implemented based on GCC VectorABI for declare simd as well. For name mangling, we have to follow certain rules of C/C++ (e.g. prefix needs to _ZVG ....). David Majnemer who is the owner and stakeholder for approval for Clang and LLVM. Also, we need to pay attention to GCC compatibility. I would suggest you look into how GCC VectorABI can

[Proposal][RFC] Epilog loop vectorization

2017 Mar 14

[Proposal][RFC] Epilog loop vectorization

On 03/14/2017 11:58 AM, Michael Kuperstein wrote: > I'm still not sure about this, for a few reasons: > > 1) I'd like to try to treat epilogue loops the same way regardless of > whether the main loop was vectorized by hand or automatically. So if > someone hand-wrote an avx-512 16-wide loop, with alias checks, and we > decide it's profitable to vectorize the

similar to: RFC: Extending LV to vectorize outerloops