Hi, On 1/31/19 11:20 PM, Jacob Lifshay wrote:> We're in-progress designing a RISC-V extension > (http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) > that would have variable-length vectors of short vectors (1 to 4): > <VL x <4 x float>> > where each predicate bit masks out a whole short vector. We're using > this extension to vectorize graphics code where where variables in the > pre-vectorization code are short vectors. > So, vectorizing code like: > for(int i = 0; i < 1000; i++) > { > vec4 color = colors[i]; > vec3 normal = normals[i]; > color.rgb *= fmax(0.0, dot(normal, light_dir)); > colors[i] = color; > } > > I'm planning on passing already vectorized code into LLVM and using > LLVM as a backend for optimization and JIT code generation. > > Do you think the EVL proposal would support an ISA like this as it's > currently written (by pattern matching on predicate expansion and > vector-length multiplication)? > Or, do you think the EVL proposal would need modification to > effectively support this (by adding a element group size argument to > EVL intrinsics or something)?We could untie the mask length from the data length: %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32 %L) would then indicate the the mask %M applies to groups of "4 / 1" float elements. - Simon> Jacob Lifshay > > On Thu, Jan 31, 2019, 07:58 Simon Moll via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> wrote: > > Hi, > > There is now an RFC for a roadmap to native vector predication > support in LLVM and a prototype implementation: > > https://reviews.llvm.org/D57504 > > The prototype demonstrates: > > - Predicated vector intrinsics with an explicit mask and vector > length parameter on IR level. > - First-class predicated SDNodes on ISel level. Mask and vector > length are value operands. > - An incremental strategy to generalize > PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on > both regular instructions and EVL intrinsics. > - DAGCombiner example: FMA fusion. > - InstCombine/InstSimplify example: FSub pattern re-writes. > - Early experiments on the LNT test suite (Clang static release, > O3 -ffast-math) indicate that compile time on non-EVL IR is not > affected by the API abstractions in PatternMatch, etc. > > We’d like to get your feedback, in particular on the following to > move forward: > > - Can we agree on EVL intrinsics as a transitional step to > predicated IR instructions? > - Can we agree on native EVL SDNodes for CodeGen? > - Are the changes to InstCombine/InstSimplify/DAGCombiner and > utility classes that go with it acceptable? > > Thanks > Simon > > -- > > Simon Moll > Researcher / PhD Student > > Compiler Design Lab (Prof. Hack) > Saarland University, Computer Science > Building E1.3, Room 4.31 > > Tel. +49 (0)681 302-57521 :moll at cs.uni-saarland.de <mailto:moll at cs.uni-saarland.de> > Fax. +49 (0)681 302-3065 :http://compilers.cs.uni-saarland.de/people/moll > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Simon Moll Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/322abdca/attachment-0001.html>
On Fri, Feb 1, 2019 at 1:54 AM Simon Moll <moll at cs.uni-saarland.de> wrote:> Hi, > On 1/31/19 11:20 PM, Jacob Lifshay wrote: > > We're in-progress designing a RISC-V extension ( > http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html) > that would have variable-length vectors of short vectors (1 to 4): > <VL x <4 x float>> > where each predicate bit masks out a whole short vector. We're using this > extension to vectorize graphics code where where variables in the > pre-vectorization code are short vectors. > So, vectorizing code like: > for(int i = 0; i < 1000; i++) > { > vec4 color = colors[i]; > vec3 normal = normals[i]; > color.rgb *= fmax(0.0, dot(normal, light_dir)); > colors[i] = color; > } > > I'm planning on passing already vectorized code into LLVM and using LLVM > as a backend for optimization and JIT code generation. > > Do you think the EVL proposal would support an ISA like this as it's > currently written (by pattern matching on predicate expansion and > vector-length multiplication)? > Or, do you think the EVL proposal would need modification to effectively > support this (by adding a element group size argument to EVL intrinsics or > something)? > > We could untie the mask length from the data length: > > %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x > float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32 %L) > > would then indicate the the mask %M applies to groups of "4 / 1" float > elements. >Sounds good to me. I haven't checked if the current code allows for that. Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190201/4a503f3d/attachment.html>
Luke Kenneth Casson Leighton via llvm-dev
2019-Feb-02 00:39 UTC
[llvm-dev] [RFC] Vector Predication
On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de> wrote:> > We could untie the mask length from the data length: > > %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x > float> %x, <scalable 4 x float> %y, <scalable 1 x i1> %M, i32 %L) > > would then indicate the mask %M applies to groups of "4 / 1" float > elements. >That would provide the greatest flexibility, as a 1:1 ratio could mean 1 bit per element, covering the normal case. Question: are there any circumstances under which it is desirable to underspecify or overspecify the number of bits in the predicate? ie to deliberately have a FP vector of length 11 and a mask of length 9 or 13? Or, is that just a runtime error. L. -- --- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190202/f05b8b2f/attachment.html>
On 2/2/19 1:39 AM, Luke Kenneth Casson Leighton wrote:> > > On Friday, February 1, 2019, Simon Moll <moll at cs.uni-saarland.de > <mailto:moll at cs.uni-saarland.de>> wrote: > > We could untie the mask length from the data length: > > %result = call <scalable 4 x float> > @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> > %y, <scalable 1 x i1> %M, i32 %L) > > would then indicate the mask %M applies to groups of "4 / 1" float > elements. > > > That would provide the greatest flexibility, as a 1:1 ratio could mean > 1 bit per element, covering the normal case. > > Question: are there any circumstances under which it is desirable to > underspecify or overspecify the number of bits in the predicate? > > ie to deliberately have a FP vector of length 11 and a mask of length > 9 or 13?You are referring to the sub-vector sizes, if i am understanding correctly. I'd assume that the mask sub-vector length always has to be either 1 or the same as the data sub-vector length. For example, this is ok: %result = call <scalable 3 x float> @llvm.evl.fsub.v4f32(<scalable 3 x float> %x, <scalable 3 x float> %y, <scalable 1 x i1> %M, i32 %L) %result = call <scalable 5 x float> @llvm.evl.fsub.v4f32(<scalable 5 x float> %x, <scalable 5 x float> %y, <scalable 1 x i1> %M, i32 %L) %result = call <16 x float> @llvm.evl.fsub.v4f32(<16 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) This is invalid IR: %result = call <scalable 4 x float> @llvm.evl.fsub.v4f32(<scalable 4 x float> %x, <scalable 4 x float> %y, <scalable 2 x i1> %M, i32 %L) %result = call <scalable 11 x float> @llvm.evl.fsub.v4f32(<scalable 11 x float> %x, <scalable 11 x float> %y, <scalable 9 x i1> %M, i32 %L) %result = call <5 x float> @llvm.evl.fsub.v4f32(<5 x float> %x, <5 x float> %y, <7 x i1> %M, i32 %L) In case you are talking about the dynamic vector length (eg what happens if the dynamic length's don't match at runtime), i think the key here is to regard the vector length parameter "vlen %L" as a contract: the semantics of the EVL operation is undefined if the runtime lengths of the vectors are shorter than indicated by %L. That is the mask has a minimum element count of %L * mask sub-vector length, the data has a minimum element count of %L * data sub-vector length. - Simon> > Or, is that just a runtime error. > > L. > > > -- > --- > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68 >-- Simon Moll Researcher / PhD Student Compiler Design Lab (Prof. Hack) Saarland University, Computer Science Building E1.3, Room 4.31 Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de Fax. +49 (0)681 302-3065 : http://compilers.cs.uni-saarland.de/people/moll -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190204/c12093f0/attachment.html>