thr3ads.net - search: "_

Displaying 2 results from an estimated 2 matches for "__v4sf".

2020 Aug 19

Question about llvm vectors

...orizontal sums of the values are stored in the upper bits of the /// destination. /// \returns A 128-bit vector of [4 x float] containing the horizontal sums of /// both operands. static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_hadd_ps(__m128 __a, __m128 __b) { return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); } Here clang will translate _mm_hadd_ps to a CPU specific feature. Why not create __builtin_vector_hadd(a, b) which would select the CPU specific instruction or a fallback generic implementation? Many thanks, Alex -------------- next part -------------- An HTML attachment was s...

Question about llvm vectors

2020 Aug 20

Question about llvm vectors

...; /// destination. >> /// \returns A 128-bit vector of [4 x float] containing the horizontal >> sums of >> /// both operands. >> static __inline__ __m128 __DEFAULT_FN_ATTRS >> _mm_hadd_ps(__m128 __a, __m128 __b) >> { >> return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); >> } >> >> Here clang will translate _mm_hadd_ps to a CPU specific feature. >> Why not create __builtin_vector_hadd(a, b) which would select the CPU >> specific instruction or a fallback generic implementation? >> >> Many thanks, >>...

search for: __v4sf