thr3ads.net - similar to: "LoopVectorizer: shufflevectors"

Displaying 20 results from an estimated 10000 matches similar to: "LoopVectorizer: shufflevectors"

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

2020 Jan 30

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

Currently, for scalable vectors, only splat shuffles are allowed; we're considering allowing more different kinds of shuffles. The issue is, essentially, that a shuffle mask is a simple list of integers, and that isn't enough to express a scalable operation. For example, concatenating two fixed-length vectors currently looks like this: shufflevector <2 x i32> %v1, <2 x i32>

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

2020 Jan 30

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

On Thu, 30 Jan 2020 at 08:22, Nicolai Hähnle via llvm-dev <llvm-dev at lists.llvm.org> wrote: > This fixed list of shuffles makes me uncomfortable, and I wonder if > there isn't a much simpler solution to the problem. Specifically, > allow the IR form: > > %result = shufflevector <vscale x n x TY> %v1, <vscale x n x TY> %v2, > <m x i32> <mask>

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

2020 Feb 07

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message----- > From: Chris Lattner <clattner at nondot.org> > Sent: Wednesday, February 5, 2020 4:02 PM > To: Eli Friedman <efriedma at quicinc.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors > (SVE etc.) > > On Jan 29, 2020, at 4:48 PM, Eli Friedman via

LoopVectorizer -- generating bad and unhandled shufflevector sequence

2016 Oct 06

LoopVectorizer -- generating bad and unhandled shufflevector sequence

Hi, I have experimented with enabling the LoopVectorizer for SystemZ. I have come across a loop which, when vectorized, seems to have been poorly generated. In short, there seems to be a completely unnecessary sequence of shufflevector instructions, that doesn't get optimized away anywhere. In other words, there is a shuffling so that leads back to the original vector: [0 1 2 3

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

On 07/24/2018 02:58 AM, Nema, Ashutosh wrote: > > > > > > *From:*Hal Finkel <hfinkel at anl.gov> > *Sent:* Tuesday, July 24, 2018 5:05 AM > *To:* Craig Topper <craig.topper at gmail.com>; hideki.saito at intel.com; > estotzer at ti.com; Nemanja Ivanovic <nemanja.i.ibm at gmail.com>; Adam > Nemet <anemet at apple.com>; graham.hunter at

[arm, aarch64] Alignment checking in interleaved access pass

2016 Oct 10

[arm, aarch64] Alignment checking in interleaved access pass

Hi Renato, Thank you for the answers! First, let me clarify a couple of things and give some context. The patch it looking at VSTn, rather than VLDn (stores seem to be somewhat harder to get the "right" patterns, the pass is doing a good job for loads already) The examples you gave come mostly from loop vectorization, which, as I understand it, was the reason for adding the

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

2020 Feb 08

[RFC] Extending shufflevector for vscale vectors (SVE etc.)

> -----Original Message----- > From: Chris Lattner <clattner at nondot.org> > Sent: Friday, February 7, 2020 3:00 PM > To: Eli Friedman <efriedma at quicinc.com> > Cc: llvm-dev <llvm-dev at lists.llvm.org> > Subject: [EXT] Re: [llvm-dev] [RFC] Extending shufflevector for vscale vectors > (SVE etc.) > > > On Feb 7, 2020, at 12:39 PM, Eli Friedman

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Hi Michael, Sometime back I did some experiments with interleave vectorizer and did not found any degrade, probably my tests/benchmarks are not extensive enough to cover much. Elina is the right person to comment on it as she already experienced cases where it hinders performance. For interleave vectorizer on X86 we do not have any specific costing, it goes to BasicTTI where the costing is not

IR canonicalization: vector select or shufflevector?

2016 Aug 29

IR canonicalization: vector select or shufflevector?

x86 has also put a lot of effort into shuffle lowering...so much so that it is its own life-form and brings most online codeviewer apps to their knees when you try to open X86ISelLowering.cpp. :) Given that: 1. There are at least 2 targets that lean towards shuffle (Martin's comment + x86 uses lowerVSELECTtoVectorShuffle() for all cases like the example posted here) 2. Size-changing shuffles

enabling interleaved access loop vectorization

2016 May 26

enabling interleaved access loop vectorization

Interleaved access is not enabled on X86 yet. We looked at this feature and got into conclusion that interleaving (as loads + shuffles) is not always profitable on X86. We should provide the right cost which depends on number of shuffles. Number of shuffles depends on permutations (shuffle mask). And even if we estimate the number of shuffles, the shuffles are not generated in-place. Vectorizer

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

Regarding InterleavedAccessPass - sure, but proper strided/interleaved access optimization ought to have a positive impact even without target support. Case in point - Hal enabled it on PPC last September. An important difference vs. x86 seems to be that arbitrary shuffles are cheap on PPC, but, as I said below, I hope we can enable it on x86 with a conservative cost function, and still get

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 02

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

Hi all, At the Barcelona Supercomputing Center, we have been working on an end-to-end vectorizer using scalable vectors for RISC-V Vector extension in context of the EPI Project <https://www.european-processor-initiative.eu/accelerator/>. We earlier shared a demo of our prototype implementation (https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks involved with LLVM

IR canonicalization: vector select or shufflevector?

2016 Aug 29

IR canonicalization: vector select or shufflevector?

I have a slight preference towards shufflevector, because it makes sequences of shuffles, where only some of the shuffles can be converted into selects (because the input and output vector sizes of the others don't match) simpler to reason about. I'm not sure this is a particularly good reason, though. On Mon, Aug 29, 2016 at 8:19 AM, Philip Reames via llvm-dev < llvm-dev at

[LLVMdev] [llvm-commits] [llvm] r65296 - in /llvm/trunk: include/llvm/CodeGen/ lib/CodeGen/SelectionDAG/ lib/Target/CellSPU/ lib/Target/PowerPC/ lib/Target/X86/ test/CodeGen/X86/

2009 Feb 24

[LLVMdev] [llvm-commits] [llvm] r65296 - in /llvm/trunk: include/llvm/CodeGen/ lib/CodeGen/SelectionDAG/ lib/Target/CellSPU/ lib/Target/PowerPC/ lib/Target/X86/ test/CodeGen/X86/

On Feb 23, 2009, at 6:13 PM, Scott Michel wrote: > On Mon, Feb 23, 2009 at 4:03 PM, Nate Begeman <natebegeman at me.com> > wrote: > > It's basically as Chris said; there will be a ShuffleVectorSDNode, > and appropriate helper functions, node profile, and DAGCombiner > support. > > Fine. For vector shuffles. But again, what about vector constants, >

LoopVectorizer with ifconversion

2017 Mar 17

LoopVectorizer with ifconversion

On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote: > In general, this is true everywhere. In a large vectorized loop, this cost > may well be worthwhile. The idea is that the cost model should account for > all of these costs. If it doesn't properly, we should fix that. Isn't this only worth when the SIMD instructions can be conditionalised per lane? I

IR canonicalization: shufflevector or vector trunc?

2017 Jan 13

IR canonicalization: shufflevector or vector trunc?

Right - I think that case looks like this for little endian: define <2 x i32> @zextshuffle(<2 x i16> %x) { %zext_shuffle = shufflevector <2 x i16> %x, <2 x i16> zeroinitializer, <4 x i32> <i32 0, i32 2, i32 1, i32 2> %bc = bitcast <4 x i16> %zext_shuffle to <2 x i32> ret <2 x i32> %bc } define <2 x i32> @zextvec(<2 x i16>

LoopVectorizer with ifconversion

2017 Mar 17

LoopVectorizer with ifconversion

Hi, it seems to be generally a bad idea to enable vectorization of conditional stores on SystemZ, because it will cost extra instructions both to 1. extract compare result element 2. Do a test-under-mask instruction on that element 3. conditional branch past the store block. Ideally, I would like to adjust the cost for the vector compare. I am not sure if this is feasable since I would need

IR canonicalization: shufflevector or vector trunc?

2017 Jan 17

IR canonicalization: shufflevector or vector trunc?

We use InstCombiner::ShouldChangeType() to prevent transforms to illegal integer types, but I'm not sure how that would apply to vector types. Ie, let's say v256 is a legal type in your example. DataLayout doesn't appear to specify what configurations of a 256-bit vector are legal, so I don't think we can currently use that to say v2i128 should be treated differently than v16i16.

enabling interleaved access loop vectorization

2016 Aug 05

enabling interleaved access loop vectorization

On 5 August 2016 at 21:00, Demikhovsky, Elena <elena.demikhovsky at intel.com> wrote: > As far as I remember, may be I’m wrong, vectorizer does not generate > shuffles for interleave access. It generates a bunch of extracts and inserts > that ought to be coupled into shuffles after wise. That's my understanding as well. Whatever strategy we take, it will be a mix of telling

[RFC] Make LoopVectorize Aware of SLP Operations

2018 Feb 08

[RFC] Make LoopVectorize Aware of SLP Operations

Hi Florian! This proposal sounds pretty exciting! Integrating SLP-aware loop vectorization (or the other way around) and SLP into the VPlan framework is definitely aligned with the long term vision and we would prefer this approach to the LoopReroll and InstCombine alternatives that you mentioned. We prefer a generic implementation that can handle complicated cases to something ad-hoc for some

similar to: LoopVectorizer: shufflevectors