thr3ads.net - search: "__builtin_ia32

Displaying 2 results from an estimated 2 matches for "__builtin_ia32_haddps".

2020 Aug 19

Question about llvm vectors

...operands. /// The horizontal sums of the values are stored in the upper bits of the /// destination. /// \returns A 128-bit vector of [4 x float] containing the horizontal sums of /// both operands. static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_hadd_ps(__m128 __a, __m128 __b) { return __builtin_ia32_haddps((__v4sf)__a, (__v4sf)__b); } Here clang will translate _mm_hadd_ps to a CPU specific feature. Why not create __builtin_vector_hadd(a, b) which would select the CPU specific instruction or a fallback generic implementation? Many thanks, Alex -------------- next part -------------- An HTML attachme...

Question about llvm vectors

2020 Aug 20

Question about llvm vectors

Hi Craig, Thank you very much for your answer. I did not want to discuss exactly the semantic and name of one operation but instead raise the question "would it be beneficial to have more vector builtins?". You wrote that the compiler will recognize a pattern and replace it by __builtin_ia32_haddps when possible, but how can I be sure of that? I would have to disassemble the generated code right? It is very impractical isn'it? And it leads me to understand that each CPU target has a bank of patterns which it can recognize but wouldn't it be very similar to have advanced generic vector...

search for: __builtin_ia32_haddps