Displaying 20 results from an estimated 10000 matches similar to: "Vectorizing remainder loop"
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors
So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch.
I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though.
Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2018 Jul 29
2
Vectorizing remainder loop
Hello, I m working on a hardware with very large vector width till v2048.
Now when I vectorize using llvm default vectorizer maximum 2047 iterations
are scalar remainder loop. These are not vectorized by llvm which increases
the cost. However these should be vectorized using next available vector
width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4.....
The issue of scalar remainder loop has
2017 Feb 27
4
[Proposal][RFC] Epilog loop vectorization
Thanks for looking into this.
1) Issues with re running vectorizer:
Vectorizer might generate redundant alias checks while vectorizing epilog loop.
Redundant alias checks are expensive, we like to reuse the results of already computed alias checks.
With metadata we can limit the width of epilog loop, but not sure about reusing alias check result.
Any thoughts on rerunning vectorizer with reusing
2017 Feb 23
2
[Proposal][RFC] Epilog loop vectorization
On 02/22/2017 11:52 AM, Adam Nemet via llvm-dev wrote:
> Hi Ashutosh,
>
>> On Feb 22, 2017, at 1:57 AM, Nema, Ashutosh via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Hi,
>> This is a proposal about epilog loop vectorization.
>> Currently Loop Vectorizer inserts an epilogue loop for handling loops
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>
>
>
> >It may not be a full solution for the problems you're trying to solve
>
>
>
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme
>
> works for mathlib only, I’m happy to take that and discuss OpenMP
>
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).
reference: https://sourceware.org/glibc/wiki/libmvec
Using the example case given in the reference, we found there are 2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.
So yes, I think that would allow us to remove the VecLib mappings because
we are
2017 Dec 06
3
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
Proposal for Outer Loop Vectorization Implementation Plan
=============================================
=====
Goal:
=====
Extending Loop Vectorizer (LV) such that it can handle outer loops, via VPlan infrastructure enhancements.
Understand the trade-offs in trying to make concurrent progress with moving remaining inner loop vectorization
functionality to VPlan infrastructure
===========
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate
bigger types than target supported:
Based on VF currently we check the cost and generate the expected set of
instruction[s] for bigger type. It has two challenges for bigger types cost
is not always correct and code generation may not generate efficient
instruction[s].
Probably can depend on the support provided by below RFC by
2017 Dec 14
3
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
>Another might be to introduce changes under feature flags to ease the revert/reintroduce/revert cycle.
This is essentially the first guard. We plan to have flags/settings to control which types of outer loops are handled.
The new code path is initially exclusive to outer loop vectorization. If we disable all types of outer loops
(and that's the initial default), LV continues to be good
2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can
>do a great job optimizing.
Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates
the output of vectorizer:
If vectorizer is the best place to perform the optimization, it should do so.
This includes the cases like
2018 Jun 29
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Ashutosh,
Thanks for the repy.
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now
in the trunk is "not legal enough". I'll work on the patch to stop
2017 Dec 06
5
[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Status Update on VPlan ---- where we are currently, and what's ahead of us
==========================================================
Goal:
-----
Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan.
The goal of this status update is to summarize the progress and the future steps needed.
Background:
-----------
This is related to
2016 Jun 15
3
[Proposal][RFC] Strided Memory Access Vectorization
Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly.
-----Original Message-----
From: Saito, Hideki
Sent: Wednesday, June 15, 2016 1:39 PM
To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access
Ashutosh,
First,
2018 Jan 15
0
[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan
To revive the discussion around vectorizer testing, here's a quick
sample of a few of the issues hit recently in the loop vectorizer. I
want to be careful to say that I am not stating these are the result of
any recent work, just that they're issues that have been triaged down to
the loop vectorizer doing something incorrect or questionable from a
performance perspective.
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later.
I plan to pitch in (e.g., perf experiments).
>Probably can depend on the support provided by below RFC by Michael:
> "Allow loop vectorizer to choose vector widths that generate illegal types"
>In that case Loop Vectorizer will
2018 Jul 04
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Hi,
On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>;
> Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com
2018 Jul 24
2
KNL Vectorization with larger vector width
Hello,
I need help here. I am able to adjust the vector width through
WidestRegister value. When number of iterations=31 and I set vector
width=32 it gives <16xi32> and <8xi32> instructions.
However if i replicate same behavior with number of iterations=63 and I
set vector width=64, no vector instructions are emitted. it should do as
previous and gives <32xi32> and
2018 Jan 06
2
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
Amara,
>I support this direction
Thanks for the support.
>but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in some form.’
It's not like I have specific application code in
2017 Aug 07
3
VBROADCAST Implementation Issues
Thank You. Still getting errors.I have modified my instructions as you said
as follows:
def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb),
(ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2),
"GATHER_256B\t{$src2, {$dst} {${mask}}|${dst}
{${mask}}, $src2}",
[(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32
(masked_gather