similar to: LV: predication

Displaying 20 results from an estimated 8000 matches similar to: "LV: predication"

2020 May 01
5
LV: predication
Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything
2020 May 20
2
LV: predication
Hi Ayal, Let me start with commenting on this: > A dedicated intrinsic that freezes the compare instruction, for no apparent reason, may potentially cripple subsequent passes from further optimizing the vectorized loop. The point is we have a very good reason, which is that it passes on the right information on the backend, enabling opimisations as opposed to crippling them. The compare
2020 May 21
2
LV: predication
> The compare of interest is clear, I think. It compares a Vector Induction Variable with a broadcasted loop invariant value, aka the BTC. Obtaining the latter operand is the goal, clearly, but to do so, the former operand needs to be recognized as a VIV. Yep, exactly that. > What if this compare is not generated by LV’s fold-tail-by-masking transformation? Not sure I completely follow
2020 May 18
2
LV: predication
> You have similar problems with https://reviews.llvm.org/D79100 The new revision D79100<https://reviews.llvm.org/D79100> solves your comment 1), and I don't think your comments2) and 3) apply as there are no vendor specific intrinsics involved at all here. Just to quickly discuss the optimisation pipeline, D79100<https://reviews.llvm.org/D79100> is a small extension for the
2020 May 19
2
LV: predication
Invitation accepted, I am happy to help out with reviews, like I did with the previous VP patches. And of course agreed that things should be well defined, and that we shouldn't paint ourselves in a corner, but I don't think that this is the case. And it's not that I am in a rush, but I don't think this change needs to be predicated on a big change landing first like the LV
2020 May 19
3
LV: predication
Hi Simon, Thanks for reposting the example, and looking at it more carefully, I think it is very similar to my first proposal. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic
2020 May 04
3
LV: predication
> The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop. The intrinsic is marked as IntrNoDuplicate, so I wasn't worried about it ending up somewhere else. Also, it is a property of a specific loop, a tail-folded vector loop, that holds even after it is transformed I think. I.e. unrolling a vector loop is probably not what you want, but even if you do
2020 May 18
2
LV: predication
Hi, I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this. Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 13:32 To: Sjoerd Meijer <Sjoerd.Meijer at
2020 May 04
3
LV: predication
Hi Roger, That's a good example, that shows most of the moving parts involved here. In a nutshell, the difference is, and what we would like to make explicit, is the vector trip versus the scalar loop trip count. In your IR example, the loads/stores are predicated on a mask that is calculated from a splat induction variable, which is compared with the vector trip count. Illustrated with your
2019 Jul 15
2
Tail-Loop Folding/Predication
I am looking for feedback to add support for a new loop pragma to Clang/LLVM. With "#pragma tail_predicate" the idea would be to indicate that a loop epilogue/tail can, or should be, folded into the main loop. I see two use cases for this pragma. First, this could be interesting for the vectorizer. It currently supports tail folding by masking all loop instructions/blocks, but does this
2020 Nov 06
4
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
On 11/6/20 8:49 AM, Roger Ferrer Ibáñez wrote: Hi Sjoerd, Trying to remember how everything fits together here, but could get.active.lane.mask not create the %mask of the VP intrinsics? Or in other words, in the vectoriser, who's producing the %mask and %evl that is consumed by the VP intrinsics? I'm not sure what would be the best way here. I think about the Loop Vectorizer. I imagine
2020 Nov 06
2
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
On 11/6/20 12:39 PM, Sjoerd Meijer wrote: Hello Simon, Thanks for your replies, very useful. And yes, thanks for the example and making the target differences clear: ; Some examples: ; RISC-V V & VE(*): ; %mask = (splat i1 1) ; %evl = min(256, %n - %i) ; MVE/SVE : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() ; AVX: ; %mask = icmp (%i + (seq
2020 Nov 05
2
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
For RISC-V V and VE being explicit about %evl is important for performance & correctness and that is what VP does. The get.active.lane.mask intrinsic is used as a hint for the MVE, SVE backends to use hardware tail-predication (the backends reverse engineer that hint by pattern matching for get.active.lane.mask in the mask parameter of "some" masked intrinsics). IMHO, it's more
2020 Nov 05
0
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
Hi all, On 11/5/20 10:32 AM, Roger Ferrer Ibáñez wrote: Hi Sjoerd, thanks for pointing us to this intrinsic. I see it returns a mask/predicate type. My understanding is that VPred intrinsics have both a vector length operand and a mask operand. It looks to me that a "popcount" of get.active.lane.mask would correspond to the vector length operand. Then additional "control
2020 Aug 07
2
Branches which return values in SelectionDAG
Hi all, I am working on modeling an instruction similar to SystemZ's 'BRCT', which takes a register, decrements it, and branches if the register is nonzero. I saw that the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass as part of a pass intended to eliminate or combine compares. I then looked at ARM, where it uses the HardwareLoops pass first, and then a
2020 Mar 26
5
canonical form loops
Hello, Quick question to see if I haven't missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don't do this. So was just curious if I haven't
2020 Nov 09
0
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
; RISC-V V & VE(*): ; %mask = get.active.lane.mask(%i, %i) ; %evl = min(256, %n - %i) ; MVE/SVE/AVX : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() For VE, we want to do as much predication as possible through %evl and as little as possible with %mask. This has performance implications on VE and RISC-V - VE does not generate a mask from %evl but %evl is
2020 Apr 01
2
canonical form loops
Interesting, thanks for digging this up! > As a consequence, any loop structure that is recognized > by SCEV will (/should) not profit from rewriting. As discussed in https://reviews.llvm.org/D68577#1742745 and PR40816 showed, there is still merit and profit in further simplifying loop induction variables, or at-least the primary one; somewhat independent of continuing to rely on SCEV for
2020 Nov 06
0
Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
Hello Simon, Thanks for your replies, very useful. And yes, thanks for the example and making the target differences clear: ; Some examples: ; RISC-V V & VE(*): ; %mask = (splat i1 1) ; %evl = min(256, %n - %i) ; MVE/SVE : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() ; AVX: ; %mask = icmp (%i + (seq <8 x i32> 0,1,2,.,)), %n, ; %evl
2019 May 20
3
[RFC] Intrinsics for Hardware Loops
Hi, Arm have recently announced the v8.1-M architecture specification for our next generation microcontrollers. The architecture includes vector extensions (MVE) and support for low-overhead branches (LoB), which can be thought of a style of hardware loop. Hardware loops aren't new to LLVM, other backends (at least Hexagon and PPC that I know of) also include support. These implementations