Displaying 2 results from an estimated 2 matches for "__v4sf".
2020 Aug 19
2
Question about llvm vectors
...orizontal sums of the values are stored in the upper bits of the
/// destination.
/// \returns A 128-bit vector of [4 x float] containing the horizontal sums
of
/// both operands.
static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_hadd_ps(__m128 __a, __m128 __b)
{
return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b);
}
Here clang will translate _mm_hadd_ps to a CPU specific feature.
Why not create __builtin_vector_hadd(a, b) which would select the CPU
specific instruction or a fallback generic implementation?
Many thanks,
Alex
-------------- next part --------------
An HTML attachment was s...
2020 Aug 20
2
Question about llvm vectors
...; /// destination.
>> /// \returns A 128-bit vector of [4 x float] containing the horizontal
>> sums of
>> /// both operands.
>> static __inline__ __m128 __DEFAULT_FN_ATTRS
>> _mm_hadd_ps(__m128 __a, __m128 __b)
>> {
>> return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b);
>> }
>>
>> Here clang will translate _mm_hadd_ps to a CPU specific feature.
>> Why not create __builtin_vector_hadd(a, b) which would select the CPU
>> specific instruction or a fallback generic implementation?
>>
>> Many thanks,
>>...