Displaying 2 results from an estimated 2 matches for "__builtin_ia32_haddps".
2020 Aug 19
2
Question about llvm vectors
...operands.
/// The horizontal sums of the values are stored in the upper bits of the
/// destination.
/// \returns A 128-bit vector of [4 x float] containing the horizontal sums
of
/// both operands.
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_hadd_ps(__m128 __a, __m128 __b)
{
return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b);
}
Here clang will translate _mm_hadd_ps to a CPU specific feature.
Why not create __builtin_vector_hadd(a, b) which would select the CPU
specific instruction or a fallback generic implementation?
Many thanks,
Alex
-------------- next part --------------
An HTML attachme...
2020 Aug 20
2
Question about llvm vectors
Hi Craig,
Thank you very much for your answer.
I did not want to discuss exactly the semantic and name of one operation
but instead raise the question "would it be beneficial to have more vector
builtins?".
You wrote that the compiler will recognize a pattern and replace it by
__builtin_ia32_haddps when possible, but how can I be sure of that? I would
have to disassemble the generated code right? It is very
impractical isn'it? And it leads me to understand that each CPU target has
a bank of patterns which it can recognize but wouldn't it be very similar
to have advanced generic vector...