thr3ads.net - similar to: "Tail-Loop Folding/Predication"

Displaying 20 results from an estimated 10000 matches similar to: "Tail-Loop Folding/Predication"

2020 May 04

LV: predication

> The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop. The intrinsic is marked as IntrNoDuplicate, so I wasn't worried about it ending up somewhere else. Also, it is a property of a specific loop, a tail-folded vector loop, that holds even after it is transformed I think. I.e. unrolling a vector loop is probably not what you want, but even if you do

LV: predication

2020 May 04

LV: predication

Hi Roger, That's a good example, that shows most of the moving parts involved here. In a nutshell, the difference is, and what we would like to make explicit, is the vector trip versus the scalar loop trip count. In your IR example, the loads/stores are predicated on a mask that is calculated from a splat induction variable, which is compared with the vector trip count. Illustrated with your

LV: predication

2020 May 01

LV: predication

Hello, We are working on predication for our vector extension (MVE). Since quite a few people are working on predication and different forms of it (e.g. SVE, RISC-V, NEC), I thought I would share what we would like to add to the loop vectoriser. Hopefully it's just a minor one and not intrusive, but could be interesting and useful for others, and feedback on this is welcome of course. TL;DR:

LV: predication

2020 May 01

LV: predication

Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything

LV: predication

2020 May 20

LV: predication

Hi Ayal, Let me start with commenting on this: > A dedicated intrinsic that freezes the compare instruction, for no apparent reason, may potentially cripple subsequent passes from further optimizing the vectorized loop. The point is we have a very good reason, which is that it passes on the right information on the backend, enabling opimisations as opposed to crippling them. The compare

LV: predication

2020 May 18

LV: predication

> You have similar problems with https://reviews.llvm.org/D79100 The new revision D79100<https://reviews.llvm.org/D79100> solves your comment 1), and I don't think your comments2) and 3) apply as there are no vendor specific intrinsics involved at all here. Just to quickly discuss the optimisation pipeline, D79100<https://reviews.llvm.org/D79100> is a small extension for the

loop vectorizer disabling

2019 Sep 10

loop vectorizer disabling

I would like to propose that loop pragma `vectorize(disable)` actually means disabling the vectorizer for that loop. This perhaps sounds really obvious (I hope it does), but currently `vectorize(disable)` sets the vectorization width to 1, and that means the vectorizer will run and could perform other tricks such as interleaving. The main reason to change the behaviour is that it will be more what

LV: predication

2020 May 19

LV: predication

Hi Simon, Thanks for reposting the example, and looking at it more carefully, I think it is very similar to my first proposal. This was met with some resistance here because it dumps loop information in the vector preheader. Doing it this early, we want to emit this in the vectoriser, puts a restriction on (future) optimisations that transform vector loops to honour/update/support this intrinsic

LV: predication

2020 May 18

LV: predication

Hi, I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this. Cheers. ________________________________ From: Simon Moll <Simon.Moll at EMEA.NEC.COM> Sent: 18 May 2020 13:32 To: Sjoerd Meijer <Sjoerd.Meijer at

LV: predication

2020 May 19

LV: predication

Invitation accepted, I am happy to help out with reviews, like I did with the previous VP patches. And of course agreed that things should be well defined, and that we shouldn't paint ourselves in a corner, but I don't think that this is the case. And it's not that I am in a rush, but I don't think this change needs to be predicated on a big change landing first like the LV

LV: predication

2020 May 21

LV: predication

> The compare of interest is clear, I think. It compares a Vector Induction Variable with a broadcasted loop invariant value, aka the BTC. Obtaining the latter operand is the goal, clearly, but to do so, the former operand needs to be recognized as a VIV. Yep, exactly that. > What if this compare is not generated by LV’s fold-tail-by-masking transformation? Not sure I completely follow

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

On 11/6/20 12:39 PM, Sjoerd Meijer wrote: Hello Simon, Thanks for your replies, very useful. And yes, thanks for the example and making the target differences clear: ; Some examples: ; RISC-V V & VE(*): ; %mask = (splat i1 1) ; %evl = min(256, %n - %i) ; MVE/SVE : ; %mask = get.active.lane.mask(%i, %n) ; %evl = call @llvm.vscale() ; AVX: ; %mask = icmp (%i + (seq

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 06

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

On 11/6/20 8:49 AM, Roger Ferrer Ibáñez wrote: Hi Sjoerd, Trying to remember how everything fits together here, but could get.active.lane.mask not create the %mask of the VP intrinsics? Or in other words, in the vectoriser, who's producing the %mask and %evl that is consumed by the VP intrinsics? I'm not sure what would be the best way here. I think about the Loop Vectorizer. I imagine

[cfe-dev] ARM float16 intrinsic test

2019 Jul 12

[cfe-dev] ARM float16 intrinsic test

Dear list, git checkout llvmorg-8.0.0 -b llvm8.0 cmake -G "Unix Makefiles" ../llvm-project/llvm -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU;ARM;AArch64" [arm.cpp] #define vst4_lane_f16(__p0, __p1, __p2) __extension__ ({ \ float16x4x4_t __s1 = __p1; \ __builtin_neon_vst4_lane_v(__p0, __s1.val[0],

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

2020 Nov 05

Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)

For RISC-V V and VE being explicit about %evl is important for performance & correctness and that is what VP does. The get.active.lane.mask intrinsic is used as a hint for the MVE, SVE backends to use hardware tail-predication (the backends reverse engineer that hint by pattern matching for get.active.lane.mask in the mask parameter of "some" masked intrinsics). IMHO, it's more

[cfe-dev] ARM float16 intrinsic test

2019 Jul 12

[cfe-dev] ARM float16 intrinsic test

Hi, I do not get your result. Do I miss something? $COMP_ROOT/clang++ --target=arm-arm-eabihf -march=armv8.2a+fp16 arm.cpp -S -o - -O3 .text .syntax unified .eabi_attribute 67, "2.09" .eabi_attribute 6, 14 .eabi_attribute 7, 65 .eabi_attribute 8, 1 .eabi_attribute 9, 2 .fpu crypto-neon-fp-armv8 .eabi_attribute 12, 4

[RFC] Intrinsics for Hardware Loops

2019 May 20

[RFC] Intrinsics for Hardware Loops

Hi, Arm have recently announced the v8.1-M architecture specification for our next generation microcontrollers. The architecture includes vector extensions (MVE) and support for low-overhead branches (LoB), which can be thought of a style of hardware loop. Hardware loops aren't new to LLVM, other backends (at least Hexagon and PPC that I know of) also include support. These implementations

[LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

2019 Aug 13

[LLVM] (RFC) Addition/Support of new Vectorization Pragmas in LLVM

vecremainder/novecremainder: Should the pragma simply call the vectorizer to attempt to vectorize the remainder loop, or should the vectorizer use a different method? > > Something like that. There were patches posted at some point to enable tail-loop vectorization. At this point, I imagine that you'd construct a VPlan with the vectorized tail. Yep, committed in

[LLVMdev] Disable vectorization for unaligned data

2013 Jul 21

[LLVMdev] Disable vectorization for unaligned data

If I got you right, this is the classic case for loop peeling. I thought LLVM's vectorizer had something like that already in. On 21 July 2013 18:16, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > I will have to work on this soon as ARM also has pretty inefficient > unaligned vector loads. > NEON does support unaligned access via VLD*/VST*, what loads are you

vectorize.enable

2019 Oct 07

vectorize.enable

Hi, > The problem I see is that the warning isn't very actionable. Fully agreed. > Good warnings are supposed to be actionable, but what is the developer supposed to do in this case? This diagnostic is unclear. But to be more precise, the first part says the optimisation could not be performed. This is spot on, and an improvement of what we had before because that didn't issue

similar to: Tail-Loop Folding/Predication