thr3ads.net - similar to: "Vectorizing remainder loop"

Displaying 20 results from an estimated 10000 matches similar to: "Vectorizing remainder loop"

2018 Aug 03

Vectorizing remainder loop

>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s

Vectorizing remainder loop

2018 Jul 29

Vectorizing remainder loop

Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has

[Proposal][RFC] Epilog loop vectorization

2017 Feb 27

[Proposal][RFC] Epilog loop vectorization

Thanks for looking into this. 1) Issues with re running vectorizer: Vectorizer might generate redundant alias checks while vectorizing epilog loop. Redundant alias checks are expensive, we like to reuse the results of already computed alias checks. With metadata we can limit the width of epilog loop, but not sure about reusing alias check result. Any thoughts on rerunning vectorizer with reusing

[Proposal][RFC] Epilog loop vectorization

2017 Feb 23

[Proposal][RFC] Epilog loop vectorization

On 02/22/2017 11:52 AM, Adam Nemet via llvm-dev wrote: > Hi Ashutosh, > >> On Feb 22, 2017, at 1:57 AM, Nema, Ashutosh via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> This is a proposal about epilog loop vectorization. >> Currently Loop Vectorizer inserts an epilogue loop for handling loops

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

On 07/02/2018 04:33 PM, Saito, Hideki wrote: > > > > >It may not be a full solution for the problems you're trying to solve > > > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 02

[RFC][VECLIB] how should we legalize VECLIB calls?

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2017 Dec 06

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

Proposal for Outer Loop Vectorization Implementation Plan ============================================= ===== Goal: ===== Extending Loop Vectorizer (LV) such that it can handle outer loops, via VPlan infrastructure enhancements. Understand the trade-offs in trying to make concurrent progress with moving remaining inner loop vectorization functionality to VPlan infrastructure ===========

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2017 Dec 14

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

>Another might be to introduce changes under feature flags to ease the revert/reintroduce/revert cycle. This is essentially the first guard. We plan to have flags/settings to control which types of outer loops are handled. The new code path is initially exclusive to outer loop vectorization. If we disable all types of outer loops (and that's the initial default), LV continues to be good

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 18

[Proposal][RFC] Strided Memory Access Vectorization

>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jun 29

[RFC][VECLIB] how should we legalize VECLIB calls?

Ashutosh, Thanks for the repy. Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there. https://reviews.llvm.org/D19544 There, I see Hal's review comment "let's start only with the directly-legal calls". Apparently, what we have right now in the trunk is "not legal enough". I'll work on the patch to stop

[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

2017 Dec 06

[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Status Update on VPlan ---- where we are currently, and what's ahead of us ========================================================== Goal: ----- Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan. The goal of this status update is to summarize the progress and the future steps needed. Background: ----------- This is related to

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 15

[Proposal][RFC] Strided Memory Access Vectorization

Sorry for the spam. Copy-paste didn't capture the Subject properly. Resending with the correct Subject so that the thread is captured properly. -----Original Message----- From: Saito, Hideki Sent: Wednesday, June 15, 2016 1:39 PM To: 'llvm-dev at lists.llvm.org' <llvm-dev at lists.llvm.org> Subject: RE: [llvm-dev] [Proposal][RFC] Strided Memory Access Ashutosh, First,

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

2018 Jan 15

[RFC][LV][VPlan] Proposal for Outer Loop Vectorization Implementation Plan

To revive the discussion around vectorizer testing, here's a quick sample of a few of the issues hit recently in the loop vectorizer. I want to be careful to say that I am not stating these are the result of any recent work, just that they're issues that have been triaged down to the loop vectorizer doing something incorrect or questionable from a performance perspective.

[Proposal][RFC] Strided Memory Access Vectorization

2016 Jun 30

[Proposal][RFC] Strided Memory Access Vectorization

As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will

[RFC][VECLIB] how should we legalize VECLIB calls?

2018 Jul 04

[RFC][VECLIB] how should we legalize VECLIB calls?

Hi, On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev < llvm-dev at lists.llvm.org> wrote: > + llvm-dev > > -----Original Message----- > From: Nema, Ashutosh > Sent: Wednesday, July 4, 2018 12:12 PM > To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>; > Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com

KNL Vectorization with larger vector width

2018 Jul 24

KNL Vectorization with larger vector width

Hello, I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions. However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

2018 Jan 06

RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)

Amara, >I support this direction Thanks for the support. >but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in some form.’ It's not like I have specific application code in

VBROADCAST Implementation Issues

2017 Aug 07

VBROADCAST Implementation Issues

Thank You. Still getting errors.I have modified my instructions as you said as follows: def GATHER_256B : I<0x68, MRMSrcMem, (outs VR_2048:$dst, VK64WM:$mask_wb), (ins VR_2048:$src1, VK64WM:$mask, i2048mem:$src2), "GATHER_256B\t{$src2, {$dst} {${mask}}|${dst} {${mask}}, $src2}", [(set VR_2048:$dst, VK64WM:$mask_wb, (v64i32 (masked_gather

similar to: Vectorizing remainder loop