thr3ads.net - search: "int_vector_reduce

Displaying 3 results from an estimated 3 matches for "int_vector_reduce_fmax".

2017 Jan 31

RFC: Generic IR reductions

...single instruction can pattern match the promote->reduce sequence. And for minmax recurrence types, we have: int_vector_reduce_smax(vector_src) int_vector_reduce_smin(vector_src) int_vector_reduce_umax(vector_src) int_vector_reduce_umin(vector_src) int_vector_reduce_fmin(vector_src, i32 NoNaNs) int_vector_reduce_fmax(vector_src, i32 NoNaNs) These reduction operations can be mapped from the recurrence kinds defined in LoopUtils, however any front-end or pass in LLVM may use them. Predication =========== We have multiple options for expressing vector predication in reductions: 1. The first is to simply add a...

RFC: Generic IR reductions

2017 Jan 31

RFC: Generic IR reductions

...IR builtin be something like: @llvm.vector.reduce.add(...) ? > These intrinsics do not do any type promotion of the scalar result. Architectures like SVE which can do type promotion and reduction in a single instruction can pattern match the promote->reduce sequence. Yup. > ... > int_vector_reduce_fmax(vector_src, i32 NoNaNs) A large list, and probably doesn't even cover all SVE can do, let alone other reductions. Why not simplify this into something like: %sum = add <N x float>, <N x float> %a, <N x float> %b %red = @llvm.reduce(%sum, float %acc) or %fast_red = @ll...

RFC: Generic IR reductions

2017 Jan 31

RFC: Generic IR reductions

...reduce.[operation] in IR asm. > >> These intrinsics do not do any type promotion of the scalar result. Architectures like SVE which can do type promotion and reduction in a single instruction can pattern match the promote->reduce sequence. > > Yup. > > >> ... >> int_vector_reduce_fmax(vector_src, i32 NoNaNs) > > A large list, and probably doesn't even cover all SVE can do, let > alone other reductions. > > Why not simplify this into something like: > > %sum = add <N x float>, <N x float> %a, <N x float> %b > %red = @llvm.reduce(%...

search for: int_vector_reduce_fmax