thr3ads.net - search: "whilelo"

Displaying 3 results from an estimated 3 matches for "whilelo".

2020 May 04

LV: predication

...u can then skip but then you lose out on it.... So, I really like this: > If the problem is specifically figuring out the underlying element count given a predicate, maybe we could attack it from that angle? For example, introduce a special intrinsic for deriving the mask (sort of like the SVE whilelo). That would be an excellent way of doing it and it would also map very well to MVE too, where we have a VCTP intrinsic/instruction that creates the mask/predicate (Vector Create Tail-Predicate). So I will go for this approach. Such an intrinsic was actually also proposed in Sam's original RFC...

[RFC] Supporting ARM's SVE in LLVM

2016 Nov 04

[RFC] Supporting ARM's SVE in LLVM

...The main vector body of the resulting code is one instruction longer than it would be for NEON, but no scalar tail is required and performance will scale with register length. The *seriesvector*, *shufflevector*(splat), *icmp*, *propff*, *test* sequence has been recognized and transformed into the `whilelo` instruction. \newpage ```nasm SimpleReduction: // BB#0: subs w9, w1, #1 b.lt .LBB0_4 // BB#1: add x9, x9, #1 mov x8, xzr whilelo p0.s, xzr, x9 mov z0.s, #0 .LBB0_2: ld1w {z1.s}, p0/z,...

LV: predication

2020 May 01

LV: predication

Hi Eli, > The problem with your proposal, as written, is that the vectorizer is producing the intrinsic. Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change. This is completely impractical because the intrinsic isn’t related to anything

search for: whilelo