similar to: RFC: Generic IR reductions

Displaying 20 results from an estimated 10000 matches similar to: "RFC: Generic IR reductions"

2017 Jan 31
0
RFC: Generic IR reductions
Hi Amara, We also had some discussions on the SVE side of reductions on the main SVE thread, but this description is much more detailed than we had before. I don't want to discuss specifically about SVE, as the spec is not out yet, but I think we can cover a lot of ground until very close to SVE and do the final step when we get there. On 31 January 2017 at 17:27, Amara Emerson via
2017 Jan 31
4
RFC: Generic IR reductions
+cc Simon who's also interested in reductions for the any_true, all_true predicate vectors. On 31 January 2017 at 20:19, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi Amara, > > We also had some discussions on the SVE side of reductions on the main > SVE thread, but this description is much more detailed than we had > before. > > I don't
2017 Feb 02
3
RFC: Generic IR reductions
Thanks for the summary, some more comments inline. On 1 February 2017 at 22:02, Renato Golin <renato.golin at linaro.org> wrote: > On 1 February 2017 at 21:22, Saito, Hideki <hideki.saito at intel.com> wrote: >> I think we are converging enough at the detail level, but having a big >> difference in the opinions at the "vision" level. :) > > Vision is
2017 Feb 01
2
RFC: Generic IR reductions
> One that we have had multiple times and the usual consensus is: if it can be represented in plain IR, it must. Adding multiple semantics for the same concept, especially stiff ones like builtins, adds complexity to the optimiser. > Regardless of the merits in this case, builtins should only be introduced IFF there is no other way. So first we should discuss adding it to IR with generic
2017 Feb 01
2
RFC: Generic IR reductions
Hi All, Renato wrote: >As they say: if it's not broken, don't fix it. >Let's talk about the reductions that AVX512 and SVE can't handle with IR semantics, but let's not change the current IR semantics for no reason. Main problem for SVE: We can't write straight-line IR instruction sequence for reduction last value compute, without knowing #elements in vector to
2017 Feb 01
2
RFC: Generic IR reductions
Hi, Renato. >So I vote "it depends". :) My preference is to let vectorizer emit one kind of "reduce vector into scalar" instead of letting vectorizer choose one of many different ways. I'm perfectly fine with @llvm.reduce.op.typein.typeout.ordered?(%vector) being that "one kind of reduce vector into scalar". I think we are converging enough at the detail
2019 May 16
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hello again, I've been meaning to follow up on this thread for the last couple of weeks, my apologies for the delay. To summarise the feedback on the proposal for vector.reduce.fadd/fmul: There seems to be consensus to keep the explicit start value to better accommodate chained reductions (as opposed to generating IR that performs the reduction of the first element using extract/fadd/insert
2017 Feb 01
2
RFC: Generic IR reductions
Constant propagation: %sum = add <N x float> %a, %b @llvm.reduce(ext <N x double> %sum) if %a and %b are vector of constants, the %sum also becomes a vector of constants. At this point you have @llvm.reduce(ext <N x double> %sum) and don't know what kind of reduction do you need. - Elena -----Original Message----- From: Renato Golin [mailto:renato.golin at linaro.org]
2017 Feb 03
2
RFC: Generic IR reductions
Yes, SVE can vectorize early exit loops by using speculative (first-faulting) loads, which essentially give a predicate of the lanes loaded successfully. For uncounted loops with these special loads, the loop predicate tests can be done using a 'ptest' instruction, checking if the last element is active. Amara On 3 February 2017 at 10:15, Simon Pilgrim <llvm-dev at redking.me.uk>
2019 Apr 10
2
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
> On 8 Apr 2019, at 11:37, Simon Moll <moll at cs.uni-saarland.de> wrote: > > Hi, > > On 4/5/19 10:47 AM, Simon Pilgrim via llvm-dev wrote: >> On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: >>> On 04/04/2019 14:11, Sander De Smalen wrote: >>>> Proposed change: >>>> ---------------------------- >>>> In this RFC I
2017 Feb 01
2
RFC: Generic IR reductions
> My proposal was to have a reduction intrinsic that can infer the type by the predecessors. > For example: > @llvm.reduce(ext <N x double> ( add <N x float> %a, %b)) And if we don't have %b? We just want to sum all elements of %a? Something like @llvm.reduce(ext <N x double> ( add <N x float> %a, zeroinitializer)) Don't we have a problem with constant
2016 Nov 04
2
[RFC] Supporting ARM's SVE in LLVM
Hi, We've been working for the last two years on support for ARM's Scalable Vector Extension in LLVM, and we'd like to upstream our work. We've had to make several design decisions without community input, and would like to discuss the major changes we've made. To help with the discussions, I've attached a technical document (also in plain text below) to describe the
2017 Feb 01
2
RFC: Generic IR reductions
> If you mean "patterns may not be matched, and reduction instructions will not be generated, making the code worse", then this is just a matter of making the patterns obvious and the back-ends robust enough to cope with it, no? The Back-end should be as robust as possible, I agree. The problem that I see is in adding another kind of complexity to the optimizer that works between the
2017 Aug 04
3
Status of llvm.experimental.vector.reduce.* intrinsics
I assume smaller types like <4 x i1> are getting zero extended to e.g., i8? Am 04.08.2017 um 15:58 schrieb Amara Emerson: > Actually for mask vectors of i1 values, you don't need to use reductions > at all(although for SVE this is what we'll do). You can instead bitcast > the vector value to an i8/i16/whatever and then compare against zero. > > Amara > > On
2017 Aug 04
2
Status of llvm.experimental.vector.reduce.* intrinsics
I am currently working on a transformation pass that transforms masked.load and masked.store intrinsics to (hopefully) increase performance on targets where masked.load and masked.store are not legal. To check if the loads and stores are necessary at all I take the mask for the masked operations and want to reduce them to a single value. vector.reduce.or seemed very handy to do the job. I
2019 Apr 05
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: > On 04/04/2019 14:11, Sander De Smalen wrote: >> Proposed change: >> >> ---------------------------- >> >> In this RFC I propose changing the intrinsics for >> llvm.experimental.vector.reduce.fadd and >> llvm.experimental.vector.reduce.fmul (see options A and B). I also >> propose renaming
2017 Jun 07
2
[RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)
Hi Renato, Thanks for taking a look. Answers inline below, let me know if I've missed something out. -Graham > On 5 Jun 2017, at 17:55, Renato Golin <renato.golin at linaro.org> wrote: > > Hi Graham, > > Just making sure some people who voiced concerns are copied + cfe-dev. > > On 1 June 2017 at 15:22, Graham Hunter via llvm-dev > <llvm-dev at
2018 Jul 02
3
[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths
Hi, i am the main author of RV, the Region Vectorizer (github.com/cdl-saarland/rv). I want to share our standpoint as potential users of the proposed vector-length agnostic IR (RISC-V, ARM SVE). -- support for `llvm.experimental.vector.reduce.*` intrinsics -- RV relies heavily on predicate reductions (`or` and `and` reduction) to tame divergent loops and provide a vector-length agnostic
2020 Nov 11
3
An update on scalable vectors in LLVM
Hi all, It's been a while since we've given an update on scalable vector support in LLVM. Over the last 12 months a lot of work has been done to make LLVM cope with scalable vectors. This effort is now starting to bear fruit with LLVM gaining more capabilities, including an intrinsics interface for AArch64 SVE/SVE2, LLVM IR Codegen for scalable vectors, and several loop-vectorization
2019 Apr 04
5
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hi all, While working on a patch to improve codegen for fadd reductions on AArch64, I stumbled upon an outstanding issue with the experimental.vector.reduce.fadd/fmul intrinsics where the accumulator argument is ignored when fast-math is set on the intrinsic call. This behaviour is described in the LangRef (https://www.llvm.org/docs/LangRef.html#id1905) and is mentioned in