Displaying 20 results from an estimated 10000 matches similar to: "LoopVectorizer: shufflevectors"
2020 Jan 30
7
[RFC] Extending shufflevector for vscale vectors (SVE etc.)
Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles. The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation. For example, concatenating two fixed-length vectors currently looks like this:
shufflevector <2 x i32> %v1, <2 x i32>
2020 Jan 30
2
[RFC] Extending shufflevector for vscale vectors (SVE etc.)
On Thu, 30 Jan 2020 at 08:22, Nicolai Hähnle via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> This fixed list of shuffles makes me uncomfortable, and I wonder if
> there isn't a much simpler solution to the problem. Specifically,
> allow the IR form:
>
> %result = shufflevector <vscale x n x TY> %v1, <vscale x n x TY> %v2,
> <m x i32> <mask>
2020 Feb 07
2
[RFC] Extending shufflevector for vscale vectors (SVE etc.)
> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Wednesday, February 5, 2020 4:02 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors
> (SVE etc.)
>
> On Jan 29, 2020, at 4:48 PM, Eli Friedman via
2016 Oct 06
2
LoopVectorizer -- generating bad and unhandled shufflevector sequence
Hi,
I have experimented with enabling the LoopVectorizer for SystemZ. I have
come across a loop which, when vectorized, seems to have been poorly
generated. In short, there seems to be a completely unnecessary sequence
of shufflevector instructions, that doesn't get optimized away anywhere.
In other words, there is a shuffling so that leads back to the original
vector:
[0 1 2 3
2018 Jul 24
2
[LoopVectorizer] Improving the performance of dot product reduction loop
On 07/24/2018 02:58 AM, Nema, Ashutosh wrote:
>
>
>
>
>
> *From:*Hal Finkel <hfinkel at anl.gov>
> *Sent:* Tuesday, July 24, 2018 5:05 AM
> *To:* Craig Topper <craig.topper at gmail.com>; hideki.saito at intel.com;
> estotzer at ti.com; Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; Adam
> Nemet <anemet at apple.com>; graham.hunter at
2016 Oct 10
2
[arm, aarch64] Alignment checking in interleaved access pass
Hi Renato,
Thank you for the answers!
First, let me clarify a couple of things and give some context.
The patch it looking at VSTn, rather than VLDn (stores seem to be somewhat
harder to get the "right" patterns, the pass is doing a good job for loads
already)
The examples you gave come mostly from loop vectorization, which, as I
understand it, was the reason for adding the
2020 Feb 08
2
[RFC] Extending shufflevector for vscale vectors (SVE etc.)
> -----Original Message-----
> From: Chris Lattner <clattner at nondot.org>
> Sent: Friday, February 7, 2020 3:00 PM
> To: Eli Friedman <efriedma at quicinc.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors
> (SVE etc.)
>
> > On Feb 7, 2020, at 12:39 PM, Eli Friedman
2016 Aug 05
3
enabling interleaved access loop vectorization
Hi Michael,
Sometime back I did some experiments with interleave vectorizer and did not found any degrade,
probably my tests/benchmarks are not extensive enough to cover much.
Elina is the right person to comment on it as she already experienced cases where it hinders performance.
For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not
2016 Aug 29
2
IR canonicalization: vector select or shufflevector?
x86 has also put a lot of effort into shuffle lowering...so much so that it
is its own life-form and brings most online codeviewer apps to their knees
when you try to open X86ISelLowering.cpp. :)
Given that:
1. There are at least 2 targets that lean towards shuffle (Martin's comment
+ x86 uses lowerVSELECTtoVectorShuffle() for all cases like the example
posted here)
2. Size-changing shuffles
2016 May 26
2
enabling interleaved access loop vectorization
Interleaved access is not enabled on X86 yet.
We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer
2016 Aug 05
2
enabling interleaved access loop vectorization
Regarding InterleavedAccessPass - sure, but proper strided/interleaved
access optimization ought to have a positive impact even without target
support.
Case in point - Hal enabled it on PPC last September. An important
difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC,
but, as I said below, I hope we can enable it on x86 with a conservative
cost function, and still get
2020 Nov 02
2
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
Hi all,
At the Barcelona Supercomputing Center, we have been working on an
end-to-end vectorizer using scalable vectors for RISC-V Vector extension
in context of the EPI Project
<https://www.european-processor-initiative.eu/accelerator/>. We earlier
shared a demo of our prototype implementation
(https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks
involved with LLVM
2016 Aug 29
2
IR canonicalization: vector select or shufflevector?
I have a slight preference towards shufflevector, because it makes
sequences of shuffles, where only some of the shuffles can be converted
into selects (because the input and output vector sizes of the others don't
match) simpler to reason about.
I'm not sure this is a particularly good reason, though.
On Mon, Aug 29, 2016 at 8:19 AM, Philip Reames via llvm-dev <
llvm-dev at
2009 Feb 24
2
[LLVMdev] [llvm-commits] [llvm] r65296 - in /llvm/trunk: include/llvm/CodeGen/ lib/CodeGen/SelectionDAG/ lib/Target/CellSPU/ lib/Target/PowerPC/ lib/Target/X86/ test/CodeGen/X86/
On Feb 23, 2009, at 6:13 PM, Scott Michel wrote:
> On Mon, Feb 23, 2009 at 4:03 PM, Nate Begeman <natebegeman at me.com>
> wrote:
>
> It's basically as Chris said; there will be a ShuffleVectorSDNode,
> and appropriate helper functions, node profile, and DAGCombiner
> support.
>
> Fine. For vector shuffles. But again, what about vector constants,
>
2017 Mar 17
3
LoopVectorizer with ifconversion
On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote:
> In general, this is true everywhere. In a large vectorized loop, this cost
> may well be worthwhile. The idea is that the cost model should account for
> all of these costs. If it doesn't properly, we should fix that.
Isn't this only worth when the SIMD instructions can be
conditionalised per lane? I
2017 Jan 13
2
IR canonicalization: shufflevector or vector trunc?
Right - I think that case looks like this for little endian:
define <2 x i32> @zextshuffle(<2 x i16> %x) {
%zext_shuffle = shufflevector <2 x i16> %x, <2 x i16> zeroinitializer, <4
x i32> <i32 0, i32 2, i32 1, i32 2>
%bc = bitcast <4 x i16> %zext_shuffle to <2 x i32>
ret <2 x i32> %bc
}
define <2 x i32> @zextvec(<2 x i16>
2017 Mar 17
2
LoopVectorizer with ifconversion
Hi,
it seems to be generally a bad idea to enable vectorization of
conditional stores on SystemZ, because it will cost extra instructions
both to 1. extract compare result element 2. Do a test-under-mask
instruction on that element 3. conditional branch past the store block.
Ideally, I would like to adjust the cost for the vector compare. I am
not sure if this is feasable since I would need
2017 Jan 17
2
IR canonicalization: shufflevector or vector trunc?
We use InstCombiner::ShouldChangeType() to prevent transforms to illegal
integer types, but I'm not sure how that would apply to vector types.
Ie, let's say v256 is a legal type in your example. DataLayout doesn't
appear to specify what configurations of a 256-bit vector are legal, so I
don't think we can currently use that to say v2i128 should be treated
differently than v16i16.
2016 Aug 05
2
enabling interleaved access loop vectorization
On 5 August 2016 at 21:00, Demikhovsky, Elena
<elena.demikhovsky at intel.com> wrote:
> As far as I remember, may be I’m wrong, vectorizer does not generate
> shuffles for interleave access. It generates a bunch of extracts and inserts
> that ought to be coupled into shuffles after wise.
That's my understanding as well.
Whatever strategy we take, it will be a mix of telling
2018 Feb 08
0
[RFC] Make LoopVectorize Aware of SLP Operations
Hi Florian!
This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some