Displaying 20 results from an estimated 20000 matches similar to: "RFC: Extending LV to vectorize outerloops"
2017 Dec 06
5
[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Status Update on VPlan ---- where we are currently, and what's ahead of us
==========================================================
Goal:
-----
Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan.
The goal of this status update is to summarize the progress and the future steps needed.
Background:
-----------
This is related to
2017 Dec 06
3
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
Proposal for Outer Loop Vectorization Implementation Plan
=============================================
=====
Goal:
=====
Extending Loop Vectorizer (LV) such that it can handle outer loops, via VPlan infrastructure enhancements.
Understand the trade-offs in trying to make concurrent progress with moving remaining inner loop vectorization
functionality to VPlan infrastructure
===========
2015 Jul 08
7
[LLVMdev] LLVM loop vectorizer
Hello.
I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure
but the LLVM loop vectorizer is not able to handle such code.
I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option
with clang and -loop-vectorize with opt-3.4 .
The CSR SpMV function is inspired from
2016 Aug 09
2
enabling interleaved access loop vectorization
Thanks Ayal!
I'll take a look at DENBench.
As another data point - I tried enabling this on our internal benchmarks.
I'm seeing one regression, and it seems to be a regression of the "good"
kind - without interleaving we don't vectorize the innermost loop, and with
interleaving we do. The vectorized loop is actually significantly faster
when benchmarked in isolation, but in
2016 Jun 04
4
[LLVMdev] LLVM loop vectorizer
Hi Alex,
I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes.
What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:)
Thanks,
Michael
> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
>
2016 Feb 18
3
[LLVMdev] LLVM loop vectorizer
Hi Alex,
I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes.
Thanks,
Michael
> On Feb 15, 2016, at 6:44 AM, RCU
2016 Aug 16
2
enabling interleaved access loop vectorization
Hi Ayal, Elena,
I'd really like to enable this by default.
As I wrote above, I didn't see any regressions in internal benchmarks, and
there doesn't seem to be anything in SPEC2006 either. I do see a
performance improvement in an internal benchmark (that is, a real
workload).
Would you be able to provide an example that gets pessimized? I have no
doubt you've seen regressions
2016 Jun 16
2
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types
Some thoughts:
o To determine the VF for a loop with mixed data sizes, choosing the smallest ensures each vector register used is full, choosing the largest will minimize the number of vector registers used. Which one’s better, or some size in between, depends on the target’s costs for the vector operations, availability of registers and possibly control/memory divergence and trip count. “This is
2016 Aug 17
2
enabling interleaved access loop vectorization
Thanks Ayal!
On Wed, Aug 17, 2016 at 2:14 PM, Zaks, Ayal <ayal.zaks at intel.com> wrote:
> Hi Michael,
>
>
>
> Don’t quite have a full reproducer for you yet. You’re welcome to try and
> see what’s happening in 32 bit mode when enabling interleaving for the
> following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”:
>
>
>
> void rgb2yik
2016 Jun 07
2
[LLVMdev] LLVM loop vectorizer
Hi Alex,
This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771
Adam
> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hello.
> Mikhail, I come back to this older thread.
> I need to do a few changes to LoopVectorize.cpp.
>
> One of them is related to figuring out the exact C source line
2016 Sep 01
2
enabling interleaved access loop vectorization
So turns out it is a full reproducer after all (choosing to vectorize on AVX), good.
> The details are in PR29025.
Interesting. (So we should carefully insert unconditional branches inside shuffle sequences, eh? ;-)
> But if we modify the program by adding "*out++ = 0" right after "*out++ = q;" (thus eliminating the pesky <12 x i8>), we get:
Indeed such
2016 Aug 07
2
enabling interleaved access loop vectorization
We checked the gathered data again. All regressions that we see are in 32-bit mode. The 64-bit mode looks good overall.
- Elena
From: Michael Kuperstein [mailto:mkuper at google.com]
Sent: Saturday, August 06, 2016 02:56
To: Renato Golin <renato.golin at linaro.org>
Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Matthew Simpson <mssimpso at codeaurora.org>;
2017 Dec 14
3
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
>Another might be to introduce changes under feature flags to ease the revert/reintroduce/revert cycle.
This is essentially the first guard. We plan to have flags/settings to control which types of outer loops are handled.
The new code path is initially exclusive to outer loop vectorization. If we disable all types of outer loops
(and that's the initial default), LV continues to be good
2018 Jan 15
0
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
To revive the discussion around vectorizer testing, here's a quick
sample of a few of the issues hit recently in the loop vectorizer. I
want to be careful to say that I am not stating these are the result of
any recent work, just that they're issues that have been triaged down to
the loop vectorizer doing something incorrect or questionable from a
performance perspective.
2012 Feb 08
5
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi Dave,
>> We generate xEXT nodes in many cases. Unlike GCC which vectorizes
>> inner loops, we vectorize the implicit outermost loop of data-parallel
>> workloads (also called whole function vectorization). We vectorize
>> code even if the user uses xEXT instructions, uses mixed types, etc.
>> We choose a vectorization factor which is likely to generate more
2016 Dec 12
0
[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer
Hi Xinmin,
I have updated the clang patch using the standard name mangling you
suggested - I was not fully aware of the C++ mangling convention “_ZVG”.
I am using “D” for 64-bit NEON and “Q” for 128-bit NEON, which makes NEON
vector symbols look as follows:
_ZVGQN2v__Z1fd
_ZVGDN2v__Z1ff
_ZVGQN4v__Z1ff
Here “Q” means -> NEON 128-bit, “D” means -> NEON 64-bit
Please notice that although
2017 Mar 14
10
[Proposal][RFC] Epilog loop vectorization
Summarizing the discussion on the implementation approaches.
Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running vectorization pass and limiting vectorization factor of epilog loop by metadata.
<Approach-2>
Challenges with
2016 Nov 30
5
[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer
Dear all,
I have just created a couple of differential reviews to enable the
vectorisation of loops that have function calls to routines marked with
“#pragma omp declare simd”.
They can be (re)viewed here:
* https://reviews.llvm.org/D27249
* https://reviews.llvm.org/D27250
The current implementation allows the loop vectorizer to generate vector
code for source file as:
#pragma omp declare
2016 Dec 08
6
[RFC] Enable "#pragma omp declare simd" in the LoopVectorizer
Hi Francesco, a bit more information. GCC veclib is implemented based on GCC VectorABI for declare simd as well.
For name mangling, we have to follow certain rules of C/C++ (e.g. prefix needs to _ZVG ....). David Majnemer who is the owner and stakeholder for approval for Clang and LLVM. Also, we need to pay attention to GCC compatibility. I would suggest you look into how GCC VectorABI can
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 11:58 AM, Michael Kuperstein wrote:
> I'm still not sure about this, for a few reasons:
>
> 1) I'd like to try to treat epilogue loops the same way regardless of
> whether the main loop was vectorized by hand or automatically. So if
> someone hand-wrote an avx-512 16-wide loop, with alias checks, and we
> decide it's profitable to vectorize the