similar to: [Proposal][RFC] Strided Memory Access Vectorization

Displaying 20 results from an estimated 10000 matches similar to: "[Proposal][RFC] Strided Memory Access Vectorization"

2016 Jun 18
2
[Proposal][RFC] Strided Memory Access Vectorization
>Vectorizer's output should be as clean as vector code can be so that analyses and optimizers downstream can >do a great job optimizing. Guess I should clarify this philosophical position of mine. In terms of vector code optimization that complicates the output of vectorizer: If vectorizer is the best place to perform the optimization, it should do so. This includes the cases like
2016 Jun 30
0
[Proposal][RFC] Strided Memory Access Vectorization
One common concern raised for cases where Loop Vectorizer generate bigger types than target supported: Based on VF currently we check the cost and generate the expected set of instruction[s] for bigger type. It has two challenges for bigger types cost is not always correct and code generation may not generate efficient instruction[s]. Probably can depend on the support provided by below RFC by
2016 Jun 30
1
[Proposal][RFC] Strided Memory Access Vectorization
As a strong advocate of logical vector representation, I'm counting on community liking Michael's RFC and that'll proceed sooner than later. I plan to pitch in (e.g., perf experiments). >Probably can depend on the support provided by below RFC by Michael: > "Allow loop vectorizer to choose vector widths that generate illegal types" >In that case Loop Vectorizer will
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get
2018 Jul 04
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Hi, On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev < llvm-dev at lists.llvm.org> wrote: > + llvm-dev > > -----Original Message----- > From: Nema, Ashutosh > Sent: Wednesday, July 4, 2018 12:12 PM > To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>; > Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2013 Oct 23
2
[LLVMdev] First attempt at recognizing pointer reduction
On 23 October 2013 16:05, Arnold Schwaighofer <aschwaighofer at apple.com>wrote: > In the examples you gave there are no reduction variables in the loop > vectorizer’s sense. But, they all have memory accesses that are strided. > This is what I don't get. As far as I understood, a reduction variable is the one that aggregates the computation done by the loop, and is used
2018 Jan 06
2
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
Amara, >I support this direction Thanks for the support. >but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in some form.’ It's not like I have specific application code in
2013 Oct 21
2
[LLVMdev] First attempt at recognizing pointer reduction
Hi Arnold, To sum up my intentions, I want to understand how the reduction/induction variable detection works in LLVM, so that I can know better how to detect different patterns in memory, not just the stride vectorization. For instance, even if the relationship between each loop would be complicated, I know that in each loop, all three reads are sequential. So, at least, I could use a
2018 Jan 07
0
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
On 01/05/2018 06:28 PM, Saito, Hideki wrote: > Amara, > >> I support this direction > Thanks for the support. > >> but are there actually any real world workloads where gather/scatter scalarisation would be worth it, on any micro-architecture? If we don’t have examples and the compile time cost is non-negligible then I think we’d still like to keep the early >bailouts in
2018 Jul 02
8
[RFC][VECLIB] how should we legalize VECLIB calls?
On 07/02/2018 04:33 PM, Saito, Hideki wrote: > >   > > >It may not be a full solution for the problems you're trying to solve > >   > > If we are inventing a new solution, I’d like it also to solve OpenMP > declare simd legalization issue. If a small extension of existing scheme > > works for mathlib only, I’m happy to take that and discuss OpenMP >
2017 Dec 06
5
[LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us
Status Update on VPlan ---- where we are currently, and what's ahead of us ==========================================================   Goal: ----- Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan. The goal of this status update is to summarize the progress and the future steps needed.   Background: ----------- This is related to
2018 Jan 09
1
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
Thanks, Hal. I plan to post a patch w/o HW Legality early bailout first. That should enable further discussion on where the initial very high cost for "illegal masked load/store/gather/scatter" should be coming from --- like should LoopVectorize provide it? Or should it be provided by TTI? I prefer the latter (TTI) but the first revision of the patch will intentionally do the former
2018 Jan 05
2
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
All, I'm trying to refactor LoopVectorize such that it has better conformance to VPlan vision going forward (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All VP*Recipe class definitions are now moved to VPlan.h, and I have a patch under review to move LoopVectorizationPlanner class out of LoopVectorize.cpp (https://reviews.llvm.org/D41420). Next thing I'm working on is
2018 Jul 23
4
[LoopVectorizer] Improving the performance of dot product reduction loop
~Craig On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: > > Hello all, > > This code https://godbolt.org/g/tTyxpf is a dot product reduction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc
2018 Jan 05
0
RFC: [LV] any objections in moving isLegalMasked* check from Legal to CostModel? (Cleaning up LoopVectorizationLegality)
> On 5 Jan 2018, at 21:01, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > All, > > I'm trying to refactor LoopVectorize such that it has better conformance to VPlan vision going forward > (http://www.llvm.org/docs/Proposals/VectorizationPlan.html). All VP*Recipe class definitions are now > moved to VPlan.h, and I have a patch under review
2015 Apr 16
2
[LLVMdev] Code review for gather and scatter intrinsics
Hi Renato, I fully agree with you, but indexed load and store is the next step. I'm asking to review gather and scatter code. Thanks. - Elena -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org] Sent: Thursday, April 16, 2015 17:17 To: Demikhovsky, Elena Cc: llvmdev at cs.uiuc.edu; Chandler Carruth; James Molloy Subject: Re: [LLVMdev] Code review for gather
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now. So yes, I think that would allow us to remove the VecLib mappings because we are
2018 Jul 02
2
[RFC][VECLIB] how should we legalize VECLIB calls?
Adding to Ashutosh's comments, We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22). reference: https://sourceware.org/glibc/wiki/libmvec Using the example case given in the reference, we found there are 2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin