Vineet Kumar via llvm-dev
2021-Apr-01 22:50 UTC
[llvm-dev] [RFC] VP intrinsics support for the Loop Vectorizer
Hi All, As the work on Vector Predication intrinisics (https://reviews.llvm.org/D57504 <https://reviews.llvm.org/D57504>, https://reviews.llvm.org/project/profile/87/ <https://reviews.llvm.org/project/profile/87/>) continues to progress with significant parts already in upstream, this RFC proposes using them as a target for the Loop Vectorizer. We have put up a Proof-of-Concept patch on Phabricator: https://reviews.llvm.org/D99750 <https://reviews.llvm.org/D99750> (/[LV, VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept)/) *Please see the patch summary for more technical details, alternative strategies, limitations, and tentative development roadmap.* This patch contains a prototype implementation that demonstrates Loop Vectorizer generating VP intrinisics for simple integer operations on fixed vectors. SIMD ISAs such as RISC-V V-extension, NEC SX-Aurora and Power VSX with active vector length predication support can specially benefit from this since currently there is no other reasonable way in the LLVM IR to model active vector length in the vector instructions. ISAs such as AVX512 and ARM SVE with masked vector predication support could benefit by being able to use predicated operations other than just the memory operations (via masked load/store/gather/scatter intrinsics). The approach in this patch builds on top of the existing tail-folding mechanism, but instead of generating masked memory intrinsics, it generate VP intrinsics for both memory and arithmetic operations. The patch also extends VPlan to add new recipes for `PREDICATED-WIDENING` to VP intrinsics; This will eventually help to build and compare VPlans for different strategies. The patch also demonstrates different ways to compute the vector length parameter (EVL) for the VP intrinsics. Base idea is to compute `min(VF, trip_count - index)` for each vector iteration. For targets with no vector length predication, `VF` can be used as EVL and for targets with custom instructions, an experimental intrinsic is proposed. The patch is only meant to be a proof-of concept and intentionally limits itself to support only very simple cases with only integer operations, no control flow, no interleaving and other restrictions. It also uses a command line switch to force VP intrinsic support and needs tail-folding explicitly enabled. Best, Vineet Kumar -vineet.kumar at bsc.es Barcelona Supercomputing Center - Centro Nacional de Supercomputación http://bsc.es/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210402/bde5fef0/attachment.html>