similar to: RFC: Promoting experimental reduction intrinsics to first class intrinsics

Displaying 20 results from an estimated 7000 matches similar to: "RFC: Promoting experimental reduction intrinsics to first class intrinsics"

2020 Sep 09
4
RFC: Promoting experimental reduction intrinsics to first class intrinsics
Proposal to specify semantics for the FP min/max reductions: https://reviews.llvm.org/D87391 I'm not sure how we got to the current state of codegen for those, but it doesn't seem consistent or correct as-is, so I've proposed updates there too. On Wed, Jun 17, 2020 at 2:15 PM Amara Emerson via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Proposed clarification here:
2020 Apr 09
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
No we still use the shuffle expansion which is why the issue isn't unique to the intrinsic. ~Craig On Thu, Apr 9, 2020 at 10:21 AM Amara Emerson <aemerson at apple.com> wrote: > Has x86 switched to the intrinsics now? > > On Apr 9, 2020, at 10:17 AM, Craig Topper <craig.topper at gmail.com> wrote: > > That recent X86 bug isn't unique to the intrinsic. We
2020 Jun 17
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
A minor point, but I think we need to more explicitly describe the order of floating point operations in the LangRef as well: "If the intrinsic call has the ‘reassoc’ or ‘fast’ flags set, then the reduction will not preserve the associativity of an equivalent scalarized counterpart. Otherwise the reduction will be ordered, thus implying that the operation respects the associativity of a
2020 Apr 09
2
RFC: Promoting experimental reduction intrinsics to first class intrinsics
That recent X86 bug isn't unique to the intrinsic. We generate the same code from this which uses the shuffle sequence the vectorizers generated before the reduction intrinsics existed. declare i64 @llvm.experimental.vector.reduce.or.v2i64(<2 x i64>)· declare void @TrapFunc(i64) define void @parseHeaders(i64 * %ptr) { %vptr = bitcast i64 * %ptr to <2 x i64> * %vload = load
2017 Jan 31
4
RFC: Generic IR reductions
+cc Simon who's also interested in reductions for the any_true, all_true predicate vectors. On 31 January 2017 at 20:19, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi Amara, > > We also had some discussions on the SVE side of reductions on the main > SVE thread, but this description is much more detailed than we had > before. > > I don't
2017 Jan 31
0
RFC: Generic IR reductions
Hi Amara, We also had some discussions on the SVE side of reductions on the main SVE thread, but this description is much more detailed than we had before. I don't want to discuss specifically about SVE, as the spec is not out yet, but I think we can cover a lot of ground until very close to SVE and do the final step when we get there. On 31 January 2017 at 17:27, Amara Emerson via
2017 Feb 01
2
RFC: Generic IR reductions
On 1 February 2017 at 08:27, Renato Golin <renato.golin at linaro.org> wrote: > Sorry, I meant min/max + reduce, just like above. > > %sum = add <N x float>, <N x float> %a, <N x float> %b > %min = @llvm.minnum(<N x float> %sum) > %red = @llvm.reduce(%min, float %acc) No, this is wrong. I actually meant overriding the max/min intrinsics to take
2017 Jan 31
2
RFC: Generic IR reductions
Hi all, During the Nov 2016 dev meeting, we had a hackers’ lab session where we discussed some issues about loop idiom recognition, IR representation and cost modelling. I took an action to write up an RFC about introducing reduction intrinsics to LLVM to represent horizontal operations across vectors. Vector reductions have been discussed in the past before, notably here:
2019 May 16
4
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
Hello again, I've been meaning to follow up on this thread for the last couple of weeks, my apologies for the delay. To summarise the feedback on the proposal for vector.reduce.fadd/fmul: There seems to be consensus to keep the explicit start value to better accommodate chained reductions (as opposed to generating IR that performs the reduction of the first element using extract/fadd/insert
2019 Jun 20
4
RFC: Memcpy inlining in IR
Hi all, For GlobalISel, we’re exploring options for implementing inlining optimizations for memcpy and friends. However, looking around the existing implementation, I don’t see anything that would particularly be problematic for us to do it at the IR level. The existing TLI hooks to specify how certain memcpy calls should be lowered doesn’t have anything too SelectionDAG specific, and an IR
2017 Dec 15
3
[GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!
I don’t know of any further issues preventing us flipping the switch. At this point, I’d aim to flip the switch shortly after the creation of the 6.0.0 release branch, so that GlobalISel can harden a bit more enabled-by-default on trunk before it goes into an LLVM release (presumably 7.0.0 then). Thanks, Kristof > On 11 Dec 2017, at 17:08, Amara Emerson <aemerson at apple.com> wrote:
2017 Feb 03
2
RFC: Generic IR reductions
Yes, SVE can vectorize early exit loops by using speculative (first-faulting) loads, which essentially give a predicate of the lanes loaded successfully. For uncounted loops with these special loads, the loop predicate tests can be done using a 'ptest' instruction, checking if the last element is active. Amara On 3 February 2017 at 10:15, Simon Pilgrim <llvm-dev at redking.me.uk>
2017 Feb 01
2
RFC: Generic IR reductions
> My proposal was to have a reduction intrinsic that can infer the type by the predecessors. > For example: > @llvm.reduce(ext <N x double> ( add <N x float> %a, %b)) And if we don't have %b? We just want to sum all elements of %a? Something like @llvm.reduce(ext <N x double> ( add <N x float> %a, zeroinitializer)) Don't we have a problem with constant
2017 Feb 02
3
RFC: Generic IR reductions
Thanks for the summary, some more comments inline. On 1 February 2017 at 22:02, Renato Golin <renato.golin at linaro.org> wrote: > On 1 February 2017 at 21:22, Saito, Hideki <hideki.saito at intel.com> wrote: >> I think we are converging enough at the detail level, but having a big >> difference in the opinions at the "vision" level. :) > > Vision is
2019 Jun 20
2
RFC: Memcpy inlining in IR
Looks like there are a lot of opinions where memcpy expansion/inlining needs to happen: (late) IR, or if it is a backend problem, see also for example https://reviews.llvm.org/D35035. Complicating factor here is that efficient memcpy lowering is crucial for performance and code-size (and they occur a lot). Either way, I agree that the TLI hooks are not SelectionDAG specific, they can be used in
2019 Apr 10
2
[RFC] Changes to llvm.experimental.vector.reduce intrinsics
> On 8 Apr 2019, at 11:37, Simon Moll <moll at cs.uni-saarland.de> wrote: > > Hi, > > On 4/5/19 10:47 AM, Simon Pilgrim via llvm-dev wrote: >> On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote: >>> On 04/04/2019 14:11, Sander De Smalen wrote: >>>> Proposed change: >>>> ---------------------------- >>>> In this RFC I
2020 May 05
4
Codegen pass configs dependent on function attributes?
Hi all. I’m trying to get GlobalISel to work better with LTO. At the moment if you enable it via -fglobal-isel, it only adds the -mllvm -global-isel and related options to the cc1 invocation. With LTO, that doesn’t work as we need to encode codegen options into the bitcode, usually via function attributes. Does anyone have any ideas on how to achieve this? The only way I can see it working is if
2017 Nov 10
5
RFC: [GlobalISel] Towards a generic MI combiner framework
Hi everyone, This RFC concerns the design and architecture of a generic machine instruction combiner/optimizer framework to be developed as part of the GISel pipeline. As we transition from correctness and reducing the fallback rate to SelectionDAG at -O0, we’re now starting to think about using GlobalISel with optimizations enabled. There are obviously many parts to this story as optimizations
2017 Feb 01
2
RFC: Generic IR reductions
> One that we have had multiple times and the usual consensus is: if it can be represented in plain IR, it must. Adding multiple semantics for the same concept, especially stiff ones like builtins, adds complexity to the optimiser. > Regardless of the merits in this case, builtins should only be introduced IFF there is no other way. So first we should discuss adding it to IR with generic
2017 Feb 10
2
RFC: Generic IR reductions
On 9 February 2017 at 17:31, Amara Emerson <amara.emerson at gmail.com> wrote: > Ping. Does anyone else have thoughts on this? Hi Amara, It seems the people who replied in this thread are mostly in sync with the proposal, why don't you push a review in phab, and let's take this to the next level? cheers, --renato